Voice

Configure STT and TTS endpoints. Voice mode shares the same agent and tool surface as text chat — same MCP servers, same memory, same canvas-control verbs.

Speech-to-text

Streaming preferred. Cloud (Deepgram, AssemblyAI) or a local OpenAI-compatible server (e.g. Qwen3-ASR via vLLM, Faster-Whisper behind a thin HTTP shim).

Configured endpoints

Add endpoint

Text-to-speech

Cloud (ElevenLabs) or a local OpenAI-compatible TTS server (Kokoro, Qwen3-TTS, VibeVoice, F5-TTS — wrap each behind a thin /v1/audio/speech adapter).