Voice dictation: speak prompts with local Whisper
CodexUse 2.5 adds voice-to-text dictation powered by Faster Whisper running entirely on your machine. Download a model, click the mic, and speak your prompt. The transcription flows into the composer and nothing leaves your device.
How it works
- Dictation uses Faster Whisper, an optimized implementation of OpenAI's Whisper speech recognition model.
- A Python backend (
scripts/dictation_faster_whisper.py) runs locally and handles the transcription. - Transcribed text is inserted into the active composer with smart word-boundary detection — it handles spacing, cursor position, and prefix/suffix joining automatically.
- Available in both the workspace home composer and the active chat composer.
Supported models
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny | ~39 MB | Fastest | Basic |
| base | ~74 MB | Fast | Good |
| small | ~244 MB | Moderate | Better |
| medium | ~769 MB | Slower | Strong |
| large-v3 | ~1.5 GB | Slowest | Best |
Managing models
- Go to Settings → Chat → Dictation.
- You will see the list of available models with download/remove buttons.
- Click Download to fetch a model. Progress is shown inline.
- Click Cancel to abort an in-progress download.
- Click Remove to delete a downloaded model and free disk space.
You can have multiple models downloaded and switch between them as needed.
Using dictation
- Make sure at least one model is downloaded.
- Focus the chat composer (workspace home or active chat).
- Click the microphone button or use the keyboard shortcut.
- Speak your prompt. The transcribed text appears in the composer.
- Edit the text if needed, then send as usual.
Smart text insertion
Dictation handles text insertion thoughtfully:
- Detects word boundaries at the cursor position and adds appropriate spacing.
- Handles prefix and suffix joining so you can dictate into the middle of existing text.
- Manages cursor position after insertion so you can keep typing or dictating.
Privacy
- All speech-to-text processing happens on your machine.
- No audio is sent to any server, including OpenAI.
- Model files are stored locally in the CodexUse data directory.
- You can remove models at any time from settings.
Troubleshooting
| Symptom | Likely cause | Action |
|---|---|---|
| Mic button disabled or missing | No model downloaded | Go to Settings → Chat → Dictation and download at least one model. |
| Transcription is inaccurate | Using a small model or noisy environment | Try a larger model (small, medium, or large-v3). Reduce background noise. |
| Dictation is slow | Large model on limited hardware | Use a smaller model. The tiny or base models are fast on most machines. |
| Python error on startup | Missing Python or dependencies | Ensure Python 3 is installed. The dictation script requires the faster-whisper package. |
Related
Does dictation send audio to the cloud?
No. All speech-to-text processing happens locally on your machine using Faster Whisper. No audio data is transmitted anywhere.
Which Whisper model should I use?
For fast results with reasonable accuracy, the base or small model works well. For best accuracy on longer dictation, use medium or large-v3. The tiny model is fastest but less accurate.
Can I use dictation in the terminal?
Dictation is available in the chat composers (workspace home and active chat). It does not inject text directly into the integrated terminal.