Skip to content

Speech to Text (STT)#

There are speech-to-text (STT) engines packaged in Debian... from pre-AI era: - CMU Sphinx / PocketSphinx. Accuracy is vintage. - Julius. Difficult to configure in modern desktops.

The AI-era solutions:

Speech-to-text (STT) on a Debian desktop in 2026, has 2 paths: - local processing (privacy-focused, uses your hardware) or - cloud-based (requires internet, usually faster/more accurate).

Engines:#

Tool Processing Privacy Accuracy Best For...
Whisper Local High Excellent Transcribing recordings/meetings.
VOSK Local High Good Real-time typing/dictation.
DeepSpeech Local High Moderate Older systems or specific use cases.
  1. Real-Time Dictation: "Nerd Dictation" If you want to talk and see the text appear in your text editor (like LibreOffice or Gedit),

Nerd Dictation is the best lightweight tool for Linux. It uses the VOSK engine.

Why it's great: It doesn't need a GPU and is very snappy.

Installation: It's usually a Python script that you clone from GitHub. It depends on python3-vosk.

Workflow: You assign a keyboard shortcut to start/stop the "listening" mode.
  1. GNOME Integrated Solution: "Dictation" (Extension/App)

If you prefer a GUI that integrates with your desktop:

Dictation (by ElioQoshi): Check the GNOME Software center or Flatpak.
It provides a simple "Record" button that sends text directly to your clipboard or focused window.

Amberol / NewsFlash / Decibels: Some of these newer GTK4 apps are beginning to integrate transcription features using Whisper in the background.

Models:#

Model Family License Type Commercial Use? Key Restriction
Mistral Apache 2.0 Yes None (Very permissive).
Falcon (TII) Apache 2.0 Yes None (Very permissive).
Llama (Meta) Custom (Open Weights) Yes (limited) 700M+ users need permission.
Gemma (Google) Custom Yes Usage restrictions apply.
GPT-4/Gemini Closed Source No (API only) You don't own the model.

See also:#

  • https://www.youtube.com/watch?v=Cw1SESc8sdA&t=53s
  • https://www.youtube.com/watch?v=VDMbWUfHsbk

Noise Cancellation#

STT engines struggle with background hum or fan noise. For better accuracy, it is recommended to install NoiseTorch or the PipeWire Noise Suppression plugin.

The Best All-Rounder: OpenAI Whisper (Local)#

Whisper is currently the gold standard for open-source STT. It runs entirely on your machine.

How to get it: The easiest way to run it on Debian is via pip or using a specialized client like Whisper.cpp.

Requirements: A decent CPU or, ideally, an NVIDIA GPU.

Installation:

sudo apt install ffmpeg                      # Install ffmpeg first
pip install -U openai-whisper                # Install whisper

Usage: You provide an audio file, and it spits out text. whisper recording.mp3 --model medium