The open-source model improves on Whisper by using multi-head
attention to achieve speedup and reduced latency while retaining
full speech recognition accuracy
TEL
AVIV, Israel, Aug. 2, 2024
/PRNewswire/ -- aiOla, a leader in speech recognition
technology, has announced today the release of its new open-source
AI model, Whisper-Medusa. The new model, based on a multi-head
attention architecture, outperforms OpenAI's Whisper, the most
popular and best available AI speech recognition model, by
performing 50% faster with no loss in performance.
The automatic speech recognition market size is projected to
grow to $7.14 billion this year. As
voice becomes an integrated feature in most connected devices and
AI chatbots, speech recognition has emerged as a vital technology
field. Amid this rapid expansion, OpenAI disrupted the automatic
speech recognition landscape by releasing Whisper, an open-source
model considered superior to any other commercial or open-source
speech recognition model available today. Whisper, with more than 5
million downloads per month, has become the gold standard for
automatic speech recognition systems and is powering tens of
thousands of applications.
aiOla's new open-source model, Whisper-Medusa, greatly improves
the speed compared to Whisper by altering how the model predicts
tokens. While Whisper predicts one token at a time, Whisper-Medusa
can predict ten at a time, resulting in a 50% increase in speech
prediction speed and generation runtime. As a result of this
significant advancement, aiOla has decided to release the model's
weights and code today on GitHub and Hugging Face for the community
to access.
"Creating Whisper-Medusa was not an easy task, but its
significance to the community is profound," said Gill Hetz, VP of Research at aiOla."Improving
the speed and latency of LLMs is much easier to do than with
automatic speech recognition systems. The encoder and decoder
architectures present unique challenges due to the complexity of
processing continuous audio signals and handling noise or accents.
We addressed these challenges by employing our novel multi-head
attention approach, which resulted in a model with nearly double
the prediction speed while maintaining Whisper's high levels of
accuracy. It's a major feat, and we are very proud to be the first
in the industry to successfully leverage multi-head attention
architecture for automatic speech recognition systems and bring it
to the public. "
Whisper-Medusa, based on multi-head attention, is trained using
weak supervision. In this process, the main components of Whisper
are initially frozen while additional parameters are trained. This
training process involves using Whisper to transcribe audio
datasets and employing these transcriptions as labels for training
Medusa's additional token prediction modules. aiOla currently
offers Whisper-Medusa as a 10-head model, with future plans to
release a 20-head version with equivalent accuracy.
About aiOla:
aiOla's patented technology comprehends over 100 languages, and
discerns jargon, abbreviations, and acronyms, demonstrating a low
error rate even in noisy environments. aiOla's technology converts
manual processes in critical industries into data-driven,
paperless, AI-powered workflows through cutting-edge speech
recognition.
Contact:
Ali Goldberg
Concrete Media for aiOla
aiOla@concrete.media
View original
content:https://www.prnewswire.com/news-releases/aiola-releases-breakthrough-ai-model-thats-50-faster-than-openais-whisper-302213057.html
SOURCE aiOla