Adding Voice AI to a Mobile Game with Eleven Labs
How I integrated Eleven Labs' voice synthesis into Number Strike Baseball for dynamic commentary and opponent taunts.

Number Strike Baseball is fundamentally a numbers game — guess the opponent's secret number, get feedback in strikes and balls. It's engaging but visually quiet. Adding voice commentary transformed the feel of the game from 'solving a puzzle' to 'being in a stadium.'
Before committing to Eleven Labs, I evaluated four voice synthesis options: Google Cloud Text-to-Speech, Amazon Polly, Microsoft Azure Speech, and Eleven Labs. Google and Amazon produce functional but robotic-sounding output that's fine for navigation apps but wrong for game commentary. Azure's neural voices were better but required significant configuration to achieve natural-sounding prosody. Eleven Labs won on quality — their voices have natural cadence, emphasis, and emotional variation that made game commentary actually feel like a human announcer.
“How I integrated Eleven Labs' voice synthesis into Number Strike Baseball for dynamic commentary and opponent taunts.”
Eleven Labs' API generates natural-sounding voice audio from text. We use it for three features: game commentary ('Two strikes! One more and you're out!'), opponent taunts in multiplayer, and tutorial narration for new players. Each serves a different purpose and has different latency requirements.
Voice selection was a deliberate design process. We auditioned twelve different Eleven Labs voices for the commentator role, testing each with the same set of game events. The winning voice needed three qualities: clarity at low volume (many users play with the phone speaker, not headphones), distinctiveness from a typical phone notification voice, and tonal range that could convey excitement for a home run and tension for a close match. We settled on a warm baritone voice that tested well across age demographics in our beta group.
Commentary is pre-generated. We identified 40+ game events (first strike, ball, home run, game over, etc.) and generated audio for each using Eleven Labs' API during the build process. The audio files ship with the app — no runtime API calls, no latency, no cost per play. This was a deliberate architectural decision: voice that enhances gameplay can't depend on network availability.
Localization presented an unexpected challenge. Number Strike Baseball's primary markets are the U.S. and South Korea, so commentary needed to work in both English and Korean. Eleven Labs supports both languages, but the emotional cadence of sports commentary differs culturally. American baseball commentary is loud and exclamatory; Korean commentary uses a different rhythm with elongated vowels for dramatic moments. We generated separate voice models for each locale rather than translating the English commentary directly, which produced much more natural-sounding results.
Opponent taunts are the creative feature. Before a ranked match, players can type a custom taunt that gets synthesized into voice audio. The opponent hears it during gameplay. To prevent abuse, taunts go through a content moderation pipeline before synthesis: text filtering first, then the generated audio is checked for duration limits.
The content moderation pipeline for taunts is more sophisticated than a simple word filter. Text-based filtering catches explicit profanity, but players quickly learn to use homoglyphs, spacing tricks, and euphemisms. We implemented a three-layer approach: a regex-based fast filter for obvious violations, a semantic analysis layer that evaluates the intent of the message using a lightweight classifier, and a length restriction that caps taunts at 100 characters to prevent paragraph-length abuse. Messages that fail any layer are rejected with a generic 'taunt not available' message — we deliberately don't explain why to avoid teaching players how to circumvent the filters.
The technical challenge was audio mixing. Game background music, sound effects, and voice commentary all play simultaneously. Flutter's audio stack doesn't handle multi-track mixing natively, so we wrote a platform channel to the native audio engines — AVAudioEngine on iOS, AudioTrack on Android — to manage mixing, ducking, and priority.
Latency optimization for the taunt synthesis pipeline required careful architectural decisions. Eleven Labs' API takes 1-3 seconds to synthesize a short sentence, which is too slow to generate during match loading. Instead, we trigger synthesis at taunt submission time — when the player writes and submits their taunt before queuing for a match. The synthesized audio is cached in Cloud Storage, and the opponent's client downloads it during the matchmaking wait period. By the time the match starts, the audio is already on-device and plays with zero latency.
The cost model works because commentary is pre-generated (one-time cost) and taunts are rate-limited (one per match). At current usage levels, Eleven Labs costs less than $20 per month — significantly less than licensing pre-recorded commentary from voice actors.
The impact on engagement metrics validated the entire voice integration project. Average session duration increased by 34% after voice commentary was added. More tellingly, ranked match completion rates improved by 18% — players were less likely to abandon a losing game when the commentator was narrating the comeback potential. The taunt feature drove a 22% increase in rematch requests, suggesting it created a social connection between opponents. Voice transformed Number Strike Baseball from a logic puzzle into a competitive experience with personality.
Number Strike Baseball is fundamentally a numbers game — guess the opponent's secret number, get feedback in strikes and balls. It's engaging but visually quiet. Adding voice commentary transformed the feel of the game from 'solving a puzzle' to 'being in a stadium.'
Before committing to Eleven Labs, I evaluated four voice synthesis options: Google Cloud Text-to-Speech, Amazon Polly, Microsoft Azure Speech, and Eleven Labs. Google and Amazon produce functional but robotic-sounding output that's fine for navigation apps but wrong for game commentary. Azure's neural voices were better but required significant configuration to achieve natural-sounding prosody. Eleven Labs won on quality — their voices have natural cadence, emphasis, and emotional variation that made game commentary actually feel like a human announcer.
Eleven Labs' API generates natural-sounding voice audio from text. We use it for three features: game commentary ('Two strikes! One more and you're out!'), opponent taunts in multiplayer, and tutorial narration for new players. Each serves a different purpose and
...
Tags: Eleven Labs, AI, Voice, Game Dev
See Also:
→ The Five-Word Quiz That Fills an Empty Deck on Day One→ AI Agents Are Replacing the Traditional Software Development Lifecycle→ Building a Multi-Tenant Marketplace from Scratch→ PostgreSQL vs Firestore: A Practical Decision Framework→ How GenAI Reduced Our Operational Overhead by 90%Browse all articles →Key Facts
- • Category: Dev
- • Reading time: 12 min read
- • Technology: Eleven Labs
- • Technology: AI
- • Technology: Voice