
AI-driven voice technology has changed the way we interact with digital systems. Models leverage advanced speech recognition and natural language processing so the likes of AI-powered voice assistants like Siri, Alexa and Google Assistant can understand and respond to user commands. This enables hands-free control of devices, easy information retrieval and personalized user experiences.
In the business world, businesses use AI voice bots to enhance customer engagement, automate routine queries and improve response times. However, traditional models sometimes struggle with poor audio quality or specialized terminology.
Deepgram looks to refine that with its recently revealed Nova-3, its most advanced speech-to-text, or STT, model yet, setting a new benchmark for transcription accuracy in challenging audio environments.
Deepgram is a voice AI platform that offers speech-to-text, text-to-speech and full speech-to-speech capabilities. Deepgram was recently a Gold sponsor as well as a speaker at Generative AI Expo 2025.
The new model builds on Deepgram’s established reputation for high-performance AI transcription. Nova-3 aims to address the nuanced needs of industries that rely heavily on voice data, from customer service and finance to healthcare and legal services.
Nova-3 enables self-serve customization, which allows users to fine-tune the model for specialized domains without requiring deep expertise in machine learning. Many conventional models require expensive and time-consuming expert-led customization, delaying deployment and increasing costs.
With the addition of Keyterm Prompting, developers instantly improve transcription accuracy by optimizing up to 100 key terms without waiting for extensive model retraining or customization cycles. This flexibility accelerates deployment, enhances accuracy and reduces costs.
The platform also offers seamless integration with other AI-powered voice tools, including Deepgram’s text-to-speech and speech-to-speech capabilities. The company’s APIs allow developers to create and scale voice-enabled applications with minimal friction, which make Deepgram an attractive option for enterprises looking to streamline their voice AI infrastructure.
Let’s look at some of the results from a few benchmarks:
Nova-3 achieves a WER of 5.26%, which extends its lead over the next-best competitor by 47.4% (10% WER). This reduced error rate translates to more accurate transcriptions for industries that require high precision.
In streaming WER, Nova-3 leads with a WER of 6.84%, extending its advantage over the next-best competitor by 54.2% (14.92% WER). This improved accuracy ensures real-time, reliable transcription for applications such as call centers and virtual assistants.
In multilingual testing, Nova-3 outperforms OpenAI’s Whisper across seven languages, delivering up to 8:1 preference ratios in some languages. Nova-3’s advanced real-time multilingual conversation transcription empowers enterprises to scale globally and deliver reliable, accurate results across multiple languages.
"Nova-3 represents a significant leap forward, extending the frontier of real-time accuracy while once again bending the cost curve — two critical components for enterprise speech-to-speech use cases," said Scott Stephenson, CEO of Deepgram. "By integrating advanced architectural enhancements and extensive training across diverse datasets, we've developed a model that not only meets but exceeds the evolving needs of our clients across various industries."
The launch of Nova-3 comes as voice AI adoption continues to accelerate. With more than 450 enterprise customers, including major players like Twilio, Jack in the Box and Kore.ai, Deepgram cements its role in a fast-growing sector where businesses are turning to AI-driven automation to enhance customer experiences and operational efficiency.
While many STT models promise precision, real-world conditions often introduce hurdles that lead to misinterpretations, especially in noisy environments or industry-specific contexts. Nova-3’s ability to overcome these challenges makes it a game-changer for businesses that depend on reliable voice AI.
Edited by
Greg Tavarez