AI search engines provide incorrect answers over 60% now

A recent study by Columbia Journalism Review’s Tow Center for Digital Journalism reveals serious accuracy issues with generative AI models used for news searches. The research tested eight AI-powered search tools with live search capabilities and found that over 60% of responses about news content were incorrect.

The report, written by researchers Klaudia Jaźwińska and Aisvarya Chandrasekar, highlights that about one in four Americans now use AI models instead of traditional search engines. Given the high error rate uncovered, this raises serious concerns about the reliability of AI-driven searches.

The error rates varied widely among the platforms. Perplexity returned incorrect information 37% of the time, while ChatGPT Search was wrong 67% of the time (134 out of 200 queries). Grok 3 had the highest error rate at 94%.

>>>B-W5 for Vivo X Fold+ Plus V2229A

For the study, the researchers provided AI models with direct excerpts from actual news articles and asked them to identify the headline, publisher, publication date, and URL. They tested 1,600 queries across the eight AI search tools.

A major issue identified was that the AI models often did not refuse to answer when they lacked reliable information. Instead, they frequently offered plausible but incorrect or speculative answers, a behavior consistent across all models tested.

Interestingly, premium versions of some AI tools performed worse in certain aspects. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) were more likely to provide incorrect answers than their free versions. Although the premium models answered more prompts correctly, their tendency to provide uncertain responses increased overall error rates.

The study also uncovered problems related to citations and publisher control. Some AI tools appeared to ignore the Robot Exclusion Protocol, which publishers use to prevent unauthorized access. For instance, Perplexity’s free version correctly identified all 10 excerpts from paywalled National Geographic content, even though the publisher had blocked Perplexity’s crawlers.

Moreover, when these AI tools cited sources, they often directed users to syndicated versions of articles on platforms like Yahoo News, rather than linking back to the original publisher’s site—even when there were formal licensing agreements in place.

Another significant problem was URL fabrication. More than half of citations from Google’s Gemini and Grok 3 led to fabricated or broken URLs. Of 200 citations from Grok 3, 154 resulted in error pages.

>>>CL2203-7S1P-01A for Tineco CL2203-7S1P-01A

These issues create major challenges for publishers, who face difficult decisions: blocking AI crawlers could result in no attribution, while allowing access could lead to widespread reuse without driving traffic to their own websites.

Mark Howard, COO of Time magazine, expressed concern about ensuring transparency and control over how Time’s content appears in AI-generated searches. Despite the challenges, he sees potential for improvement, stating, “Today is the worst that the product will ever be,” pointing to ongoing investments and engineering aimed at refining these tools.

However, Howard also criticized users for not being skeptical of free AI tools, saying, “If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them.”

Both OpenAI and Microsoft acknowledged the study’s findings but did not specifically address the issues raised. OpenAI emphasized its commitment to supporting publishers by driving traffic through summaries, quotes, and clear attribution. Microsoft stated it follows Robot Exclusion Protocols and publisher directives.

This report builds on findings from a previous Tow Center study in November 2024, which identified similar accuracy issues with ChatGPT’s handling of news content. For more details, check out Columbia Journalism Review’s website.

Leave a Reply Cancel reply