ChatGPT's Performance Is Slipping, New Study Says

UC Berkeley researchers found that ChatGPT has not improved over time, and in fact, may have gotten worse.

Jose Antonio Lanz•

The decline was especially steep in the chatbot’s software coding abilities.

"For GPT-4, the percentage of generations that are directly executable dropped from 52.0% in March to 10.0% in June," the research found. These results were obtained by using the pure version of the models, meaning, no code interpreter plugins were involved.

To assess reasoning, the researchers leveraged visual prompts from the Abstract Reasoning Corpus (ARC) dataset. Even here, while not as steep, a decline was observable. “GPT-4 in June made mistakes on queries on which it was correct for in March” the study reads.

What could explain ChatGPT's apparent downgrade after just a few months? Researchers hypothesize it may be a side effect of optimizations being made by OpenAI, its creator.

One possibility cause is changes introduced to prevent ChatGPT from answering dangerous questions. This safety alignment could impair ChatGPT's usefulness for other tasks, though. The researchers found the model now tends to give verbose, indirect responses instead of clear answers.

"GPT-4 is getting worse over time, not better," said AI expert Santiago Valderrama on Twitter. Valderrama also raised the possibility that a "cheaper and faster" mixture of models may have replaced the original ChatGPT architecture.

“Rumors suggest they are using several smaller and specialized GPT-4 models that act similarly to a large model but are less expensive to run,” he hypothesized, which he said could accelerate responses for users but reduce competency.

There are hundreds (maybe thousands already?) of replies from people saying they have noticed the degradation in quality.

Browse the comments, and you'll read about many situations where GPT-4 is not working as before.

— Santiago (@svpino) July 19, 2023

Another expert, Dr. Jm, Fan also shared his insights on a Twitter Thread.

READ

Sam Bankman-Fried’s legal team moves to pursue theory on FTX terms of service

“Unfortunately, more safety typically comes at the cost of less usefulness,” he wrote, saying he was trying to make sense of the results by linking them to the way OpenAI finetunes its models. “My guess (no evidence, just speculation) is that OpenAI spent the majority of efforts doing lobotomy from March to June, and didn't have time to fully recover the other capabilities that matter.”

Fan argues that other factors may have come into play, namely cost-cutting efforts, the introduction of warnings and disclaimers that may “dumb down” the model, and the lack of broader feedback from the community.

While more comprehensive testing is warranted, the findings align with users' expressed frustrations over declining coherence in ChatGPT's once eloquent outputs.

How can we prevent further deterioration? Some enthusiasts advocated for open-source models like Meta's LLaMA (which has just been updated) that enable community debugging. Continuous benchmarking to catch regressions early is crucial.

For now, ChatGPT fans may need to temper their expectations. The wild idea-generating machine many first encountered appears tamer—and perhaps less brilliant. But age-related decline appears to be inevitable, even for AI celebrities.

ChatGPT’s Performance Is Slipping, New Study Says

UC Berkeley researchers found that ChatGPT has not improved over time, and in fact, may have gotten worse.

Why Tether’s USDT on Bitcoin And Lightning Network Game Changing for Global Crypto Market?

Shiba Inu Burn Rate Rockets Over 7000%, SHIB Breakout Ahead?

Dogecoin Whales Buy 460M DOGE Sparking Optimism, What’s Next?

Solana’s Network Activity Drops, But This Newcomer Is Making Headlines With Explosive Growth!

Will SAB 121 Abolition Allow Banks To Hold Bitcoin

Coinbase CLO Debunks Solana Hate Allegations

Better Markets Files Amicus Brief To Back SEC In Ripple Lawsuit

Bitwise Files for Dogecoin ETF, DOGE Price To $15?

Biggest Video Games Releasing in October 2023

Crypto Prices Today July 16: BTC Touches $64K High, ETH Nears $3,500 Amid Broader Uptrend

EverLodge and Shiba Inu: A Comparative Analysis of Two Cryptocurrencies

Protocol Village: Shibarium Bridge Opens for Token Withdrawals

Binance Backs Curve with $5M as DeFi Protocol Eyes BNB Chain Integration

Ether Jumps 10% to $3.4K After Bloomberg Ups Odds of Spot ETF Approval

Sweeping U.S. Tax Proposal Met With Boos From Crypto World

Editor's Picks

Bitcoin Breakout Or Breakdown? Ark Invest Shares Prediction

Bitwise Files for Multi-Crypto ETF After Gary Gensler Resignation Hint

Chainlink’s September Surge: How Exchange Supply Player The Major Role

TOP of the month

Bitcoin Breakout Or Breakdown? Ark Invest Shares Prediction

Bitwise Files for Multi-Crypto ETF After Gary Gensler Resignation Hint

Chainlink’s September Surge: How Exchange Supply Player The Major Role

TOP of the week

Solana’s Network Activity Drops, But This Newcomer Is Making Headlines With Explosive Growth!

Will SAB 121 Abolition Allow Banks To Hold Bitcoin

Shiba Inu Burn Rate Rockets Over 7000%, SHIB Breakout Ahead?

Worth reading

Bitcoin Breakout Or Breakdown? Ark Invest Shares Prediction

Bitwise Files for Multi-Crypto ETF After Gary Gensler Resignation Hint

Chainlink’s September Surge: How Exchange Supply Player The Major Role

UC Berkeley researchers found that ChatGPT has not improved over time, and in fact, may have gotten worse.

Related posts