Modalities
Modalities connect providers to transports; they define how a named capability is executed by the runtime.
A modality is a simple binding:
name = { provider = "...", transport = "..." }
The runtime supports multiple modality groups:
[realtime][llm][stt][tts]
All modalities follow the same structure and resolution rules, the differences come from how they are used at runtime.
Realtime modalities
Realtime modalities are used for full realtime voice sessions.
[realtime]
deepgram_rt = { provider = "deepgram", transport = "deepgram_rt" }
openai_realtime_voice = { provider = "openai", transport = "openai_realtime_ws" }
Each entry defines:
- a modality name
- a provider reference
- a transport reference
For example:
openai_realtime_voice = { provider = "openai", transport = "openai_realtime_ws" }
This means the runtime will:
- use
[providers.openai] - connect through
[transports.openai_realtime_ws] - expose the combined capability as
openai_realtime_voice
Profiles reference this modality:
[profiles.openai_realtime_profile]
realtime = "openai_realtime_voice"
LLM modalities
LLM modalities define reasoning or text generation bindings.
[llm]
openai_reasoner = { provider = "openai", transport = "openai_responses_http" }
This creates a named reasoning capability.
It is typically used internally by the runtime when:
- generating responses
- executing tool calls
- handling structured reasoning tasks
As with realtime, the modality simply binds:
- a provider
- a transport
Other modality types
The [stt] and [tts] modality groups follow the exact same structure and behavior.
They are used for:
[stt]: speech-to-text capabilities[tts]: text-to-speech capabilities
Example:
[stt]
deepgram_listen = { provider = "deepgram", transport = "deepgram_listen_ws" }
[tts]
deepgram_speak = { provider = "deepgram", transport = "deepgram_speak_ws" }
There is no additional logic in these sections; they act exactly like realtime and llm, but are scoped to specific runtime capabilities.