Local models
Point at a local OpenAI-compatible LLM endpoint (Ollama, vLLM, LM Studio, llama.cpp) and we discover which models it serves. For cloud chat keys, use Providers instead.
Local endpoints serve their own model lists — no curated downloads here. To add Whisper for transcription (which isn't served by Ollama), use the section below.
Configured endpoints
Add endpoint
Whisper weights
Faster-Whisper weights for offline transcription. Stored in $MODELS_DIR. Pyannote diarization is installed separately (requires a HF token + EULA).
Faster-Whisper large-v3 (recommended)
OpenAI Whisper large-v3 transcoded for CTranslate2. The default Whisper for production. Pair with pyannote to get WhisperX-style diarization.· 3.0 GB · 6.0 GB VRAM· MIT · Systran/faster-whisper-large-v3
Faster-Whisper medium
Balanced quality/size. Good on mid-range hardware.· 1.5 GB · 4.0 GB VRAM· MIT · Systran/faster-whisper-medium
Faster-Whisper small
Fastest, lowest-memory option. Decent quality for everyday notes.· 500 MB · 2.0 GB VRAM· MIT · Systran/faster-whisper-small