You know the situation: you’re talking to a digital assistant, but it just doesn’t quite understand you well enough because you mumbled a bit or your surroundings are noisy. Or the response takes just a bit too long to arrive. Amazon is addressing this with the introduction of Nova Sonic. This new AI voice technology offers natural interaction, minimal delay, and lower costs. According to Amazon, Nova Sonic performs better than previous models, such as Alexa or Siri, and even better than OpenAI’s GPT-4o.
Fast and affordable AI voice recognition now available
Nova Sonic is available through Amazon Bedrock, the platform where businesses can develop AI applications. The technology utilizes a bi-directional streaming API and is deployed in real-time. This allows developers to directly benefit from the speed and accuracy of this speech AI.
What distinguishes nova sonic from other models
Nova Sonic processes speech in a way that closely resembles human interaction. It waits to respond until there is a pause or the end of a sentence, and it takes interruptions into account. Additionally, the spoken input is automatically converted into text. This allows developers to use speech data directly in their own systems or applications.
The error margin in speech recognition is only 4.2% across five European languages, which, according to Amazon, is significantly better than that of competitors. In tests with multiple speakers and increased volume, Nova Sonic performed nearly 47% more accurately than OpenAI’s GPT-4o-transcribe. In terms of speed, Nova Sonic also leads with an average delay of just 1.09 seconds. In comparison, GPT-4o has a delay of 1.18 seconds.
Possible impact for companies and developers
For companies, Nova Sonic means fewer errors in speech recognition, faster interactions with customers, and lower costs. According to Amazon, the model is up to 80% cheaper than GPT-4o. This can provide direct benefits for organizations with high speech traffic, such as customer services, digital assistants, or voice-controlled apps.
This also presents opportunities for developers. Nova Sonic understands when it needs to retrieve information from the internet, search data, or perform an external action. The technology can therefore be easily integrated into existing systems and applications, promoting scalability and efficiency.
Part of a broader agile strategy
Amazon views Nova Sonic as a key building block within its broader AGI strategy: systems that can perform any task a human can do on a computer. Within that strategy, the company recently launched Nova Act, an AI model that can utilize websites and powers elements of Alexa+. Nova Sonic is the first in a series of internal AI models that Amazon is making available to developers.

