AI & Machine Learning
2 min
27 November 2023

Auteur

Lisanne Groot

Lisanne Groot

marketing consultant

AI Chatbots and Hallucinations: ChatGPT at the Top, Google's Palm-Chat Needs to Adjust

AI Chatbots and Hallucinations: ChatGPT at the Top, Google's Palm-Chat Needs to Adjust

The ability of AI chatbots to accurately present factual information is crucial, especially in sectors such as health, industry, and defense. Vectara has launched a project to assess the quality of various AI chatbots regarding their tendency to 'hallucinate,' or fabricate facts. This assessment is of great importance for the reliable use of these technologies.

In testing eleven public chatbots, including GPT-4 and Google's Palm-Chat, over 800 documents were analyzed. The chatbots were required to summarize these documents without adding any extra, non-existent information. The results indicate that GPT-4 performed the best, with the lowest hallucination rate and the highest accuracy. In contrast, Google's Palm-Chat had a hallucination rate of over 27%, indicating unreliability in the summaries.

These findings are not only relevant to the technical community but also to businesses considering the use of AI technologies for non-creative purposes. The results can serve as a useful benchmark for anyone seeking reliable AI solutions.

[@portabletext/react] Unknown block type "span", specify a component for it in the `components.types` prop

for detecting hallucinations has been developed due to the scale of the tests and the need for consistent assessment. While building a model for detecting hallucinations is simpler than creating a model without hallucinations, the current ranking is already a topic of discussion on social media.

[@portabletext/react] Unknown block type "span", specify a component for it in the `components.types` prop

will be periodically updated to keep track of the development of existing LLMs and the introduction of new ones. In the meantime, there is anticipation for the evaluation of Elon Musk's recently announced chatbot Grok, which is described as humorous and sarcastic. This could be of interest to companies looking for AI solutions with a creative twist.

Lisanne Groot  - Author

Over Lisanne Groot

marketing consultant