Speech to Text (STT)#
There are speech-to-text (STT) engines packaged in Debian... from pre-AI era: - CMU Sphinx / PocketSphinx. Accuracy is vintage. - Julius. Difficult to configure in modern desktops.
The AI-era solutions:
Speech-to-text (STT) on a Debian desktop in 2026, has 2 paths: - local processing (privacy-focused, uses your hardware) or - cloud-based (requires internet, usually faster/more accurate).
Engines:#
| Tool | Processing | Privacy | Accuracy | Best For... |
|---|---|---|---|---|
| Whisper | Local | High | Excellent | Transcribing recordings/meetings. |
| VOSK | Local | High | Good | Real-time typing/dictation. |
| DeepSpeech | Local | High | Moderate | Older systems or specific use cases. |
- Real-Time Dictation: "Nerd Dictation" If you want to talk and see the text appear in your text editor (like LibreOffice or Gedit),
Nerd Dictation is the best lightweight tool for Linux. It uses the VOSK engine.
Why it's great: It doesn't need a GPU and is very snappy.
Installation: It's usually a Python script that you clone from GitHub. It depends on python3-vosk.
Workflow: You assign a keyboard shortcut to start/stop the "listening" mode.
- GNOME Integrated Solution: "Dictation" (Extension/App)
If you prefer a GUI that integrates with your desktop:
Dictation (by ElioQoshi): Check the GNOME Software center or Flatpak.
It provides a simple "Record" button that sends text directly to your clipboard or focused window.
Amberol / NewsFlash / Decibels: Some of these newer GTK4 apps are beginning to integrate transcription features using Whisper in the background.
Models:#
| Model Family | License Type | Commercial Use? | Key Restriction |
|---|---|---|---|
| Mistral | Apache 2.0 | Yes | None (Very permissive). |
| Falcon (TII) | Apache 2.0 | Yes | None (Very permissive). |
| Llama (Meta) | Custom (Open Weights) | Yes (limited) | 700M+ users need permission. |
| Gemma (Google) | Custom | Yes | Usage restrictions apply. |
| GPT-4/Gemini | Closed Source | No (API only) | You don't own the model. |
See also:#
- https://www.youtube.com/watch?v=Cw1SESc8sdA&t=53s
- https://www.youtube.com/watch?v=VDMbWUfHsbk
Noise Cancellation#
STT engines struggle with background hum or fan noise. For better accuracy, it is recommended to install NoiseTorch or the PipeWire Noise Suppression plugin.
Related Case: Batch processing:#
The Best All-Rounder: OpenAI Whisper (Local)#
Whisper is currently the gold standard for open-source STT. It runs entirely on your machine.
How to get it: The easiest way to run it on Debian is via pip or using a specialized client like Whisper.cpp.
Requirements: A decent CPU or, ideally, an NVIDIA GPU.
Installation:
sudo apt install ffmpeg # Install ffmpeg first
pip install -U openai-whisper # Install whisper
Usage: You provide an audio file, and it spits out text.
whisper recording.mp3 --model medium