Skip to Content
Ai AgentsVoice Configuration

Voice Configuration

A reference for choosing and configuring voices, languages, and speech behavior for your AI agents. Covers both the dashboard UX and the API.

For browsing and previewing voices visually, see Voice Library. This page is for understanding the underlying configuration model.

The configuration split (read this first)

Akol’s data model has two pieces:

Lives onWhat
AgentVoice ID, agent name, default language, additional languages, avatar
BusinessSystem prompt, greeting, fallback message, voice settings (speed, etc.), interruption sensitivity, silence threshold, transfer number

When a call starts, the engine merges them: the Business is the source of truth for behavior; the Agent is the source of truth for who sounds like what. If a value is missing on Business, the engine falls back to Agent (this fallback is being deprecated — write to Business when you can).

Voices

Listing voices

GET /api/v1/voices Authorization: Bearer <token>

Returns the full ElevenLabs catalog filtered to voices Akol supports:

{ "success": true, "data": [ { "id": "a0e99841-438c-4a64-b679-ae501e7d6091", "name": "Sarah", "gender": "female", "category": "curated", "languages": ["en"], "previewUrl": "https://cdn.akol.ai/voice-previews/sarah.mp3", "description": "Warm female voice, professional and approachable", "isFavorite": false, "isHidden": false } ] }

Voice categories

CategoryWhen to use
curatedHand-picked, work well across most use cases
stableMost consistent across long calls / unusual phrasings
emotiveMore expressive — good for hospitality, healthcare, sales
supportTuned for customer support cadence (shorter pauses, calmer)

Setting an agent’s voice

PATCH /api/v1/agents/:id Content-Type: application/json { "voiceId": "a0e99841-438c-4a64-b679-ae501e7d6091", "language": "en-US", "primaryLanguage": "en" }

The voice ID must come from /api/v1/voices. Arbitrary ElevenLabs voice IDs not in our curated list (e.g. copied from another platform) are rejected with 422.

Voice favorites and hidden

Per-user preferences:

POST /api/v1/voices/:voiceId/favorite DELETE /api/v1/voices/:voiceId/favorite POST /api/v1/voices/:voiceId/hide DELETE /api/v1/voices/:voiceId/hide

Hidden voices don’t appear in the picker but are still valid — existing agents using them continue to work.

Languages

Akol supports the following BCP-47 language codes for STT, LLM, and TTS:

LanguageCodeSTTLLMTTS (voice support varies)
English (US)en-US✓ Nova-3 / FluxAll voices
English (UK)en-GB✓ Nova-3Voices with en in languages
Germande-DE✓ Nova-3 / Flux-deVoices with de in languages
Spanishes-ES✓ Nova-3Subset
Frenchfr-FR✓ Nova-3Subset
Portuguesept-BR✓ Nova-3Subset

primaryLanguage is the ISO 639-1 code (e.g. en, de) used for language-specific prompt rules. language is the full BCP-47 code passed to Deepgram.

Multilingual agents

Set additionalLanguages to allow the agent to switch mid-call:

{ "language": "en-US", "primaryLanguage": "en", "additionalLanguages": ["es", "de"] }

The voice engine detects the caller’s language from STT confidence and switches the LLM context. The voice itself doesn’t change — pick a voice whose languages array covers all your target languages.

Voice behavior tuning (Business-level)

These settings live on the Business. Tune them per business, not per agent.

PATCH /api/v1/businesses/:id Content-Type: application/json { "voiceSettings": { "speed": 1.0 }, "interruptionSensitivity": 0.7, "silenceThresholdMs": 800, "maxCallDurationMinutes": 15 }
FieldRangeDefaultWhat it does
voiceSettings.speed0.5 – 2.01.0Playback rate. 1.1 is barely noticeable; 1.3+ sounds rushed.
interruptionSensitivity0.0 – 1.00.7How quickly the agent stops speaking when the caller talks. Higher = more interruptible. Lower = agent finishes the sentence.
silenceThresholdMs400 – 2000800How long the caller must be silent before the agent assumes they’re done speaking. Shorter = faster turn-taking but more interruptions of slow speakers.
maxCallDurationMinutes1 – 6015Hard cap. The agent ends the call cleanly at this point.

Picking sensitivity values

Use caseinterruptionSensitivitysilenceThresholdMs
Customer service0.7800
Healthcare intake0.5 (let people finish)1200
Outbound sales0.8 (quick, responsive)600
Elderly users / accessibility0.41500

Pronunciation overrides

For brand names, place names, or jargon that TTS mispronounces, add SSML <phoneme> or <sub> tags directly in your systemPrompt or greetingTemplate:

You work for <sub alias="Akol">Akol</sub>, a voice AI platform. Always pronounce <phoneme alphabet="ipa" ph="ˈɑːkoʊl">Akol</phoneme> correctly.

The TTS provider strips the SSML and applies the pronunciation. This works in any field passed to TTS (greeting, fallback, system prompt context).

Voice quality troubleshooting

SymptomLikely causeFix
Agent talks over the callerinterruptionSensitivity too lowRaise to 0.7+
Agent waits too long before speakingsilenceThresholdMs too high, or LLM provider slowLower threshold; check /api/v1/health for provider status
Agent voice sounds “off” in GermanVoice ID doesn’t include de in its languages arrayPick a voice from /api/v1/voices?language=de
Voice cuts off mid-wordNetwork buffering on caller sideUsually carrier-side. Check call’s outcome for hints.
Voice prosody resets between sentencesElevenLabs request continuation context not preservedThis is a known optimization — speak in shorter sentences for now

Latency budget

End-to-end first-audio-out latency targets:

Caller speech → STT (Deepgram Flux ~150ms) → LLM (Groq ~120ms first token) → TTS (ElevenLabs Flash ~75ms first chunk) → Caller hears agent ──────────── ~470ms typical

If your calls feel slow, check /api/v1/calls/:id and look at the metadata.latencies object. The biggest knobs you control:

  • System prompt length — over ~3000 tokens significantly increases TTFT
  • Function tools — every tool definition adds context; trim unused ones
  • Voice categoryemotive voices have ~50ms more latency than stable
Last updated on