Krisp unveils VIVA 2.0 for voice AI agents

The updated voice AI infrastructure platform introduces predictive and multilingual capabilities designed for real-world conversational environments.

Krisp has launched Krisp VIVA 2.0, the latest version of its voice AI infrastructure platform built for voice agents, IVRs and conversational AI systems. The release introduces a new suite of real-time voice models designed to improve conversational accuracy and production-level audio handling.

The company said VIVA 2.0 focuses on solving challenges faced by voice AI systems operating outside controlled environments, where factors such as background noise, interruptions and telephony feedback often affect speech recognition and conversational flow.

According to Krisp, voice agent usage grew 9x in 2025, but many systems continue to struggle with issues including increased speech-to-text word error rates, inaccurate voice activity detection and self-interruptions caused by audio loopbacks.

Krisp said VIVA functions as an infrastructure layer operating before speech-to-text, large language models and text-to-speech systems, enabling AI agents to process real-world audio dynamics more effectively. The VIVA SDK operates server-side within customer audio pipelines to improve overall voice system reliability.

The latest update introduces Turn Prediction v3, a multilingual model that predicts end-of-turn moments using audio cues without transcription. The feature is designed to reduce latency while avoiding interruptions during pauses in speech.

The platform also includes Interrupt Prediction v1, described by the company as an audio-only classifier capable of identifying when users intend to interrupt a voice agent. The model differentiates between actual interruptions and conversational backchannel responses such as ‘yes’ or ‘mhm’.

Krisp has also launched a new category of Signal Detectors with VIVA 2.0, including TTS Detector, Accent Detector and Gender Detector. These tools are designed to help voice systems identify synthetic speech, optimise speech-to-text routing based on accents and enable personalised interactions.

The updated Voice Isolation v3 model has also been introduced to improve downstream word error rates in noisy audio environments.

Krisp said all models operate on standard server CPUs and use audio-only inputs without requiring transcription. The company added that the new features are included within existing VIVA pricing.

The VIVA SDK currently processes more than 12 billion minutes of voice AI traffic annually and is integrated into over 130 voice AI products, including Daily, Vapi, LiveKit, Ultravox and Telnyx.

According to Krisp, platforms using VIVA have reported a 3.5x improvement in turn-taking accuracy, 50% fewer dropped calls and 30% higher customer satisfaction.

David Casem, ceo of Telnyx, said, “At scale, the biggest challenge in voice AI isn’t the model. It’s the quality of the signal going into it. Krisp addresses that at the source, which improves everything downstream from transcription to response.”

Robert Schoenfield, evp of licensing and partnerships at Krisp, added, “Voice is becoming the primary interface between humans and AI. Those conversations don’t happen in clean environments. They happen in the real world, shaped by noise and subtle human cues. VIVA brings that layer into the system, so voice agents can operate the way people actually speak.”

Krisp will showcase VIVA 2.0 at Twilio Signal 2026 in San Francisco on May 6-7.