Local models

Point at a local OpenAI-compatible LLM endpoint (Ollama, vLLM, LM Studio, llama.cpp) and we discover which models it serves. For cloud chat keys, use Providers instead.

Local endpoints serve their own model lists — no curated downloads here. To add Whisper for transcription (which isn't served by Ollama), use the section below.

Configured endpoints

Add endpoint

Whisper weights

Faster-Whisper weights for offline transcription. Stored in $MODELS_DIR. Pyannote diarization is installed separately (requires a HF token + EULA).

Faster-Whisper large-v3 (recommended)

OpenAI Whisper large-v3 transcoded for CTranslate2. The default Whisper for production. Pair with pyannote to get WhisperX-style diarization.· 3.0 GB · 6.0 GB VRAM· MIT · Systran/faster-whisper-large-v3

Faster-Whisper medium

Balanced quality/size. Good on mid-range hardware.· 1.5 GB · 4.0 GB VRAM· MIT · Systran/faster-whisper-medium

Faster-Whisper small

Fastest, lowest-memory option. Decent quality for everyday notes.· 500 MB · 2.0 GB VRAM· MIT · Systran/faster-whisper-small