Skip to main content

Modalities

Modalities connect providers to transports; they define how a named capability is executed by the runtime.

A modality is a simple binding:

name = { provider = "...", transport = "..." }

The runtime supports multiple modality groups:

  • [realtime]
  • [llm]
  • [stt]
  • [tts]

All modalities follow the same structure and resolution rules, the differences come from how they are used at runtime.

Realtime modalities

Realtime modalities are used for full realtime voice sessions.

[realtime]
deepgram_rt = { provider = "deepgram", transport = "deepgram_rt" }
openai_realtime_voice = { provider = "openai", transport = "openai_realtime_ws" }

Each entry defines:

  • a modality name
  • a provider reference
  • a transport reference

For example:

openai_realtime_voice = { provider = "openai", transport = "openai_realtime_ws" }

This means the runtime will:

  • use [providers.openai]
  • connect through [transports.openai_realtime_ws]
  • expose the combined capability as openai_realtime_voice

Profiles reference this modality:

[profiles.openai_realtime_profile]
realtime = "openai_realtime_voice"

LLM modalities

LLM modalities define reasoning or text generation bindings.

[llm]
openai_reasoner = { provider = "openai", transport = "openai_responses_http" }

This creates a named reasoning capability.

It is typically used internally by the runtime when:

  • generating responses
  • executing tool calls
  • handling structured reasoning tasks

As with realtime, the modality simply binds:

  • a provider
  • a transport

Other modality types

The [stt] and [tts] modality groups follow the exact same structure and behavior.

They are used for:

  • [stt]: speech-to-text capabilities
  • [tts]: text-to-speech capabilities

Example:

[stt]
deepgram_listen = { provider = "deepgram", transport = "deepgram_listen_ws" }

[tts]
deepgram_speak = { provider = "deepgram", transport = "deepgram_speak_ws" }

There is no additional logic in these sections; they act exactly like realtime and llm, but are scoped to specific runtime capabilities.