Stream Opens AI Agent That Reads Your Face, Adapts Speech
Stream has launched Crashout Buddy, an open-source AI agent that reads users' facial expressions, gaze, and engagement in real time to dynamically shape both what the agent says and how it delivers speech. Built on Stream's Vision Agents framework with integration from Anam and Inworld, the agent runs across Stream's global edge network, marking a shift from text-only AI to multimodal emotionally-aware interaction.
Powered by MediaPipe, the system tracks 52 facial blendshapes to classify emotion, gaze direction, and engagement level. Inworld's TTS-2 voice model is steered via natural-language prompts, while Anam renders photorealistic, lip-synced avatar animations in real time. When a user drifts off-camera or falls silent, the agent proactively re-engages them with context-aware prompts.
As a fully open-source project, Crashout Buddy demonstrates a transition from static voice assistants to adaptive, conversational agents. Potential applications span dating, coaching, recruitment, tutoring, and customer support, where the technology could disrupt the monotonous interaction patterns of existing voice-based systems.