Current Events > LLM hallucinations are getting worse, not better

Topic List
Page List: 1
Antifar
05/05/25 6:42:02 PM
#1:


Last month, an A.I. bot that handles tech support for Cursor, an up-and-coming tool for computer programmers, alerted several customers about a change in company policy. It said they were no longer allowed to use Cursor on more than just one computer.
In angry posts to internet message boards, the customers complained. Some canceled their Cursor accounts. And some got even angrier when they realized what had happened: The A.I. bot had announced a policy change that did not exist.
We have no such policy. Youre of course free to use Cursor on multiple machines, the companys chief executive and co-founder, Michael Truell, wrote in a Reddit post. Unfortunately, this is an incorrect response from a front-line A.I. support bot.
More than two years after the arrival of ChatGPT, tech companies, office workers and everyday consumers are using A.I. bots for an increasingly wide array of tasks. But there is still no way of ensuring that these systems produce accurate information.
The newest and most powerful technologies so-called reasoning systems from companies like OpenAI, Google and the Chinese start-up DeepSeek are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.
Todays A.I. bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not and cannot decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some A.I. researchers call hallucinations. On one test, the hallucination rates of newer A.I. systems were as high as 79 percent.
These systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers. So they make a certain number of mistakes. Despite our best efforts, they will always hallucinate, said Amr Awadallah, the chief executive of Vectara, a start-up that builds A.I. tools for businesses, and a former Google executive. That will never go away.

For several years, this phenomenon has raised concerns about the reliability of these systems. Though they are useful in some situations like writing term papers, summarizing office documents and generating computer code their mistakes can cause problems.

The A.I. bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.

Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.

You spend a lot of time trying to figure out which responses are factual and which arent, said Pratik Verma, co-founder and chief executive of Okahu, a company that helps businesses navigate the hallucination problem. Not dealing with these errors properly basically eliminates the value of A.I. systems, which are supposed to automate tasks for you.

Cursor and Mr. Truell did not respond to requests for comment.

For more than two years, companies like OpenAI and Google steadily improved their A.I. systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the companys previous system, according to the companys own tests.

The company found that o3 its most powerful system hallucinated 33 percent of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAIs previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48 percent.

When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51 percent and 79 percent. The previous system, o1, hallucinated 44 percent of the time. [Antifar's note: 56% correct is a failing grade! And that's the best they've put out!]

In a paper detailing the tests, OpenAI said more research was needed to understand the cause of these results. Because A.I. systems learn from more data than people can wrap their heads around, technologists struggle to determine why they behave in the ways they do.

Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini, a company spokeswoman, Gaby Raila, said. Well continue our research on hallucinations across all models to improve accuracy and reliability.

Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a systems behavior back to the individual pieces of data it was trained on. But because systems learn from so much data and because they can generate almost anything this new tool cant explain everything. We still dont know how these models work exactly, she said.

Tests by independent companies and researchers indicate that hallucination rates are also rising for reasoning models from companies such as Google and DeepSeek.


Since late 2023, Mr. Awadallahs company, Vectara, has tracked how often chatbots veer from the truth. The company asks these systems to perform a straightforward task that is readily verified: Summarize specific news articles. Even then, chatbots persistently invent information.

Vectaras original research estimated that in this situation chatbots made up information at least 3 percent of the time and sometimes as much as 27 percent.

In the year and a half since, companies such as OpenAI and Google pushed those numbers down into the 1 or 2 percent range. Others, such as the San Francisco start-up Anthropic, hovered around 4 percent. But hallucination rates on this test have risen with reasoning systems. DeepSeeks reasoning system, R1, hallucinated 14.3 percent of the time. OpenAIs o3 climbed to 6.8.

(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement regarding news content related to A.I. systems. OpenAI and Microsoft have denied those claims.)


For years, companies like OpenAI relied on a simple concept: The more internet data they fed into their A.I. systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots.

So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, like math and computer programming. But it is falling short in other areas.

The way these systems are trained, they will start focusing on one task and start forgetting about others, said Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is among a team closely examining the hallucination problem.

Another issue is that reasoning models are designed to spend time thinking through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking.

https://www.nytimes.com/2025/05/05/technology/ai-hallucinations-chatgpt-google.html

---
Please don't be weird in my topics
... Copied to Clipboard!
DrizztLink
05/05/25 6:42:52 PM
#2:


Yes, but the line go up

---
He/Him http://guidesmedia.ign.com/guides/9846/images/slowpoke.gif https://i.imgur.com/M8h2ATe.png
https://i.imgur.com/6ezFwG1.png
... Copied to Clipboard!
BewmHedshot
05/05/25 6:45:00 PM
#3:


Social media and LLMs have done a great job of conclusively proving that being rich doesn't mean you're smart.
... Copied to Clipboard!
Cartoon_Quoter
05/05/25 6:46:14 PM
#4:


Can we skip ahead to the part where this nonsense disappears up it's own asshole?

---
I apologize for nothing!
... Copied to Clipboard!
vycebrand2
05/05/25 6:49:02 PM
#5:


Theres a old say. Garbage in, garbage out. Another is always check your source.
And another......if someone cant do it right do it your self.

---
All the iron turn to rust. All the proud men turn to dust. All things time will mend
... Copied to Clipboard!
Goldice
05/05/25 6:50:05 PM
#6:


Cartoon_Quoter posted...
Can we skip ahead to the part where this nonsense disappears up it's own asshole?

It's never going to disappear. It's really helpful in some tasks.

It's just... not the end all be all tech savants are making it out to be. Not yet at least

---
New England Patriots: Super Bowl XXXVI, XXXVIII, XXXIX, XLIX, LI, LIII Champions
... Copied to Clipboard!
thronedfire2
05/05/25 6:51:43 PM
#7:


yeah AI is getting really annoying even when just trying to google something

there have been times I got a result that seemed completely ridiculous, and I search the same prompt again and get a different result.

"AI overview" results is even worse than having to scroll past 4-5 sponsored ads, because people will see that AI result as the first thing and think it's true

---
I could see you, but I couldn't hear you You were holding your hat in the breeze Turning away from me In this moment you were stolen...
... Copied to Clipboard!
TheGoldenEel
05/05/25 7:05:57 PM
#8:


Okahu, a company that helps businesses navigate the hallucination problem
Lmfao anything but stopping using this shit and giving people their jobs back huh

---
BLACK LIVES MATTER
Games: http://backloggery.com/wrldindstries302 \\ Music: http://www.last.fm/user/DrMorberg/
... Copied to Clipboard!
Enclave
05/05/25 7:06:37 PM
#9:


I'm sure part of the issue is that they're inadvertently training AI on AI data. When you scrape something like Reddit you're going to get a big mix of humans and AI. On top of that you're also scraping memes and trolling and the LLM isn't going to know the difference between somebody trolling and them being accurate.

These are problems that are not easily solved considering just how much data they're trying to use to train.

---
The commercial says that Church isn't for perfect people, I guess that's why I'm an atheist.
... Copied to Clipboard!
Antifar
05/05/25 9:54:34 PM
#10:


Bump

---
Please don't be weird in my topics
... Copied to Clipboard!
BewmHedshot
05/05/25 10:07:20 PM
#11:


TheGoldenEel posted...
Lmfao anything but stopping using this shit and giving people their jobs back huh
LLMs haven't stolen anybody's job
... Copied to Clipboard!
Sariana21
05/05/25 10:09:25 PM
#12:


Its recursive.

---
___
Sari, Mom to DS (07/04) and DD (01/08); Pronouns: she/her/hers
... Copied to Clipboard!
TheGoldenEel
05/05/25 10:10:52 PM
#13:


BewmHedshot posted...
LLMs haven't stolen anybody's job
Who do you think used to do the chatting that the chat bots in the article are now doing

---
BLACK LIVES MATTER
Games: http://backloggery.com/wrldindstries302 \\ Music: http://www.last.fm/user/DrMorberg/
... Copied to Clipboard!
Agent_Stroud
05/05/25 10:26:55 PM
#14:


Guys! This is actually not a laughing matter if this is true, as I was watching somebody explain why LLM hallucinations are a serious problem a while back, and one of the recent (at the time of the video anyway) examples was where an AI assisted dictation software program designed to help medical personnel more efficiently document any issues with their patients was making up stuff whole cloth that was never said by anyone in the room.

Thats the kinda screwup that will have devastating consequences on peoples lives if a medical decision was made off completely erroneous information and nobody bothered to double check it until it was too late.

---
"We're going to shake things up, Morgan. Like old times." -- Alex Yu, Prey (2017)
... Copied to Clipboard!
Agent_Stroud
05/05/25 10:31:14 PM
#15:


Found the video in question. Yeah, this is definitely not good, not good at all.

https://youtu.be/28Q4SeTGmT4?si=3fobdz44A_zJLmhl

---
"We're going to shake things up, Morgan. Like old times." -- Alex Yu, Prey (2017)
... Copied to Clipboard!
Solar_Crimson
05/08/25 6:42:34 AM
#16:


BewmHedshot posted...
LLMs haven't stolen anybody's job
Companies are absolutely replacing their personnel with AI.

---
"Be good to yourself, because everyone else in the world is probably out to get you." - Dr. Harleen Quinzel
... Copied to Clipboard!
2Pacavelli
05/08/25 6:44:12 AM
#17:


The more bad data and redundant data it gets fed, the worse it will become
... Copied to Clipboard!
LuigiChalmersJr
05/08/25 7:14:07 AM
#18:


Yeah, I often have to correct ChatGPT on basic things. Then it responds "Oh, you're absolutely right!" So if I didn't correct it, it would've continued to spew out wrong information.

It's troublesome for sure. I can't really trust it with topics I'm not familiar with.

---
Los Angeles Clippers
http://i.imgur.com/3dExqUh.jpg
... Copied to Clipboard!
reincarnator07
05/08/25 7:16:36 AM
#19:


TheGoldenEel posted...
Who do you think used to do the chatting that the chat bots in the article are now doing
Why do you believe companies are using these chat bots?

---
Fan of metal? Don't mind covers? Check out my youtube and give me some feedback
http://www.youtube.com/sircaballero
... Copied to Clipboard!
Ivynn
05/08/25 7:19:22 AM
#20:


AI was supposed to be exciting and cool, why is it so lame

---
http://i.imgur.com/vDci4hD.gif
... Copied to Clipboard!
ssjevot
05/08/25 7:20:58 AM
#21:


LuigiChalmersJr posted...
Yeah, I often have to correct ChatGPT on basic things. Then it responds "Oh, you're absolutely right!" So if I didn't correct it, it would've continued to spew out wrong information.

It's troublesome for sure. I can't really trust it with topics I'm not familiar with.

It doesn't actually know when it is or isn't making stuff up. I would prefer hallucinations be called confabulations, but there is the issue of people attributing intentionality to it. It's just answering queries based on its training data. And the most important part to remember is it doesn't actually have a repository of that data or know what it was trained on. It has learned general concepts that it uses to produce responses, but is not actually able to verify the accuracy of those responses.

---
Favorite Games: BlazBlue: Central Fiction, Street Fighter III: Third Strike, Bayonetta, Bloodborne
thats a username you habe - chuckyhacksss
... Copied to Clipboard!
R_Jackal
05/08/25 7:52:02 AM
#22:


BewmHedshot posted...
LLMs haven't stolen anybody's job
In modern business, if one person can do the job of five with newer tools, four people are typically getting out of their job in some way, and if needed cheaper new hires will replace them at a later date.

As for the topic, I started using AI a bit last year despite having a distaste for it.. and stopped this year. It's typically more work to police its idiotic behavior than just finding out or doing what I need on my own. It's been relegated to note taking on things I really don't care about.
... Copied to Clipboard!
Board_hunter567
05/08/25 8:00:27 AM
#23:


The problem is it's been marketed and used as a fact relayer when it's function is to predict text one word at a time based on probability. Even hooking it up to a RAG with super low temperature settings wouldn't prevent it from making stuff up eventually.

---
http://i.imgur.com/szMsu.png
Validate your purchases and discredit the purchases of others whenever possible. Numbers objectively define quality and enjoyment.
... Copied to Clipboard!
Turbam
05/08/25 8:05:31 AM
#24:


Public LLMs will always get dumber as time goes on without someone to filter what it learns.

---
~snip (V)_(;,;)_(V) snip~
I'm just one man! Whoa! Well, I'm a one man band! https://imgur.com/p9Xvjvs
... Copied to Clipboard!
Topic List
Page List: 1