Dictation ~6 min read

Voice dictation: speak prompts with local Whisper

Q: Which Whisper model should I use?

For fast results with reasonable accuracy, the 'base' or 'small' model works well. For best accuracy on longer dictation, use 'medium' or 'large-v3'. The 'tiny' model is fastest but less accurate.

CodexUse 2.5 adds voice-to-text dictation powered by Faster Whisper running entirely on your machine. Download a model, click the mic, and speak your prompt. The transcription flows into the composer and nothing leaves your device.

Key takeaway: All speech processing is local. No audio is sent to any server. You control which model is downloaded and used.

How it works

Dictation uses Faster Whisper, an optimized implementation of OpenAI's Whisper speech recognition model.
A Python backend (scripts/dictation_faster_whisper.py) runs locally and handles the transcription.
Transcribed text is inserted into the active composer with smart word-boundary detection — it handles spacing, cursor position, and prefix/suffix joining automatically.
Available in both the workspace home composer and the active chat composer.

Supported models

Model	Size	Speed	Accuracy
tiny	~39 MB	Fastest	Basic
base	~74 MB	Fast	Good
small	~244 MB	Moderate	Better
medium	~769 MB	Slower	Strong
large-v3	~1.5 GB	Slowest	Best

Managing models

Go to Settings → Chat → Dictation.
You will see the list of available models with download/remove buttons.
Click Download to fetch a model. Progress is shown inline.
Click Cancel to abort an in-progress download.
Click Remove to delete a downloaded model and free disk space.

You can have multiple models downloaded and switch between them as needed.

Using dictation

Make sure at least one model is downloaded.
Focus the chat composer (workspace home or active chat).
Click the microphone button or use the keyboard shortcut.
Speak your prompt. The transcribed text appears in the composer.
Edit the text if needed, then send as usual.

Smart text insertion

Dictation handles text insertion thoughtfully:

Detects word boundaries at the cursor position and adds appropriate spacing.
Handles prefix and suffix joining so you can dictate into the middle of existing text.
Manages cursor position after insertion so you can keep typing or dictating.

Privacy

All speech-to-text processing happens on your machine.
No audio is sent to any server, including OpenAI.
Model files are stored locally in the CodexUse data directory.
You can remove models at any time from settings.

Troubleshooting

Symptom	Likely cause	Action
Mic button disabled or missing	No model downloaded	Go to Settings → Chat → Dictation and download at least one model.
Transcription is inaccurate	Using a small model or noisy environment	Try a larger model (small, medium, or large-v3). Reduce background noise.
Dictation is slow	Large model on limited hardware	Use a smaller model. The tiny or base models are fast on most machines.
Python error on startup	Missing Python or dependencies	Ensure Python 3 is installed. The dictation script requires the faster-whisper package.

Does dictation send audio to the cloud?

No. All speech-to-text processing happens locally on your machine using Faster Whisper. No audio data is transmitted anywhere.

Which Whisper model should I use?

For fast results with reasonable accuracy, the base or small model works well. For best accuracy on longer dictation, use medium or large-v3. The tiny model is fastest but less accurate.

Can I use dictation in the terminal?

Dictation is available in the chat composers (workspace home and active chat). It does not inject text directly into the integrated terminal.

Download CodexUse - free (2 profiles, 2 projects)

Voice dictation: speak prompts with local Whisper

How it works

Supported models

Managing models

Using dictation

Smart text insertion

Privacy

Troubleshooting

Related

Does dictation send audio to the cloud?

Which Whisper model should I use?

Can I use dictation in the terminal?