Voice
Configure STT and TTS endpoints. Voice mode shares the same agent and tool surface as text chat — same MCP servers, same memory, same canvas-control verbs.
Speech-to-text
Streaming preferred. Cloud (Deepgram, AssemblyAI) or a local OpenAI-compatible server (e.g. Qwen3-ASR via vLLM, Faster-Whisper behind a thin HTTP shim).
Configured endpoints
Add endpoint
Text-to-speech
Cloud (ElevenLabs) or a local OpenAI-compatible TTS server (Kokoro, Qwen3-TTS, VibeVoice, F5-TTS — wrap each behind a thin /v1/audio/speech adapter).