Cleanlab Announces Billion-Dollar Breakthrough in Detecting AI Hallucinations
25 Abril 2024 - 8:00AM
A startup born in a quantum computing lab has unveiled the solution
to one of generative AI’s biggest problems.
Cleanlab today launched the Trustworthy Language Model (TLM), a
fundamental advance in generative AI that detects when large
language models (LLMs) are hallucinating.
Steven Gawthorpe, PhD, Associate Director and Senior Data
Scientist at Berkeley Research Group, called the Trustworthy
Language Model “the first viable answer to LLM hallucinations that
I've seen.”
Generative AI is poised to transform every industry and
profession, but it faces a major challenge in “hallucinations,”
when LLMs generate incorrect or misleading results. A given LLM
response might sound convincing. But is it correct? Is it based in
reality? LLMs offer no way to be sure. This makes automating
sensitive tasks with generative AI all but impossible.
The lack of trust is the major obstacle to business adoption of
LLMs. Billions of dollars of productivity gains are locked up
behind this dilemma. Cleanlab is the first to crack it.
Cleanlab’s TLM combines world-class uncertainty estimation,
auto-ML ensembling and quantum information algorithms repurposed
for general computing to add trust to generative AI. Its API wraps
around any LLM, producing a reliable trustworthiness score for
every response.
In industry-standard benchmarks for LLM reliability, the TLM
beats other methods across the board. It delivers performance
that’s not just superior, but consistently superior, giving
businesses the confidence to rely on generative AI for important
jobs.
For example, businesses can use the TLM to automate customer
refunds, bringing a human reviewer into the loop whenever an LLM’s
response falls below a predetermined level of trustworthiness.
“Cleanlab’s TLM gives us the power of thousands of data
scientists to enrich data and strengthen LLM outputs, providing 10x
to 100x ROI for many of our clients. Compared to what Cleanlab is
doing, other tools aren't even on the same playing field,”
Gawthorpe said.
“Cleanlab’s TLM is a truly pioneering solution for effectively
addressing hallucinations,” added Akshay Pachaar, AI Engineer at
Lightning.ai. “The integration of Cleanlab’s trustworthiness scores
transforms human-in-the-loop workflows, enabling up to 90%
automation. It not only conserves hundreds of manpower hours weekly
but augments our efficiency in processing substantial datasets for
data enrichment, document and chat-log analysis and other
large-scale tasks. It has the potential to revolutionize how we
manage and derive value from data.”
In addition to making LLMs more trustworthy, the TLM makes them
more accurate as well. It functions as a sort of super-LLM,
checking LLMs’ output to deliver better results than LLMs on their
own. In benchmarks comparing the accuracy of GPT4 with GPT4 + TLM,
the combination of GPT4 and the TLM outperforms GPT4 by itself
every time. This makes the TLM ideal for scenarios such as:
- RAG (Retrieval Augmented Generation): Providing LLMs with
more-reliable context
- Business chatbots: Accurately answering questions from
customers and employees
- Data extraction: Extracting complex information from PDFs
- Securities analysis: Scanning stock reviews to find the
strongest buy signal.
Like other Cleanlab products, the TLM has its roots in the
founders’ groundbreaking research on uncertainty in AI datasets.
Its CEO, Curtis Northcutt, spent eight years working with the
inventor of the quantum computer to understand how to extract
reliable computation from arbitrary data. Its Chief Scientist,
Jonas Mueller, led the development of AutoGluon, the open-source
and industry-standard Auto-ML platform for AWS. Its CTO, Anish
Athlaye, is one of the world’s most renowned ML developers, with
more than 30,000 GitHub stars for his personal projects.
AWS, Google, JPMorgan Chase, Tesla and Walmart are a few of the
Fortune 500 companies using Cleanlab’s technology to improve their
data inputs. Now Cleanlab is applying that same expertise to the
output of LLMs — with economic implications that are, if anything,
even greater.
“This is a pivot point for generative AI in the enterprise,”
said Cleanlab CEO Curtis Northcutt. “Adding trust to LLMs will
change the calculus around their use. We’ll always have some
version of hallucinations. The difference is that now we have a
powerful solution to detect and manage them. That means businesses
can deploy generative AI for use cases that were previously
undreamt of, and unlock a significant new source of
productivity—and revenue.”
To learn more about the Cleanlab TLM for businesses, visit
https://cleanlab.ai/tlm.
About CleanlabFounded in 2021 by three MIT
Computer Science PhDs and trusted by hundreds of top organizations
including AWS, Chase, Google and Tesla, Cleanlab adds trust to
every input and output of data-driven processes by turning
unreliable data into reliable models and insights. Cleanlab’s AI
data platform, Cleanlab Studio, automatically finds and fixes
errors in both structured and unstructured datasets, such as
visual, text, and tabular data, and adds over 30 dimensions of
quality/trust scores for data points. Its Trustworthy Language
Model (TLM) offers the first reliable way to assess the
trustworthiness of LLM outputs. Recognized as a Forbes AI 50
company, Cleanlab is based in San Francisco and backed by leading
investors including Menlo Ventures, Bain Capital Ventures,
Databricks Ventures, TQ Ventures, Samsung Ventures and angels
including the CEOs and founders of Yahoo, GitHub, Mosaic and
Okta.
Media ContactChris
Ulbrichcleanlab@firebrand.marketing415-848-9175
A photo accompanying this announcement is available at
https://www.globenewswire.com/NewsRoom/AttachmentNg/26d32da4-f3ec-4f6c-a603-e792881b0376