Architecture
Whispefy is a small pipeline with one control loop:
trigger -> record -> silence stop -> transcribe -> clean -> insert
The cleanup path warms up its local embedding model at startup so the first real session does not pay the download cost.
Core Pieces
whispefy/audio.py reads mic input, tracks RMS levels, and stops the session when silence lasts long enough.
The app checks the recorded clip before transcription. If it is too short or too quiet, it skips Whisper and ends the session early.
whispefy/groq_pipeline.py sends the WAV file to Groq Whisper and returns plain transcript text.
The pipeline cleans the text with ChatGroq.
Then it compares the new text with the old text using local embeddings.
If the new text looks too different, Whispefy keeps the original.
whispefy/insertion.py types text directly with wtype, then falls back to clipboard paste if direct typing fails.
whispefy/app.py owns the session state, while whispefy/server.py exposes the local FastAPI trigger endpoints. The app also preloads the embedding model during startup.
Guardrails
Whispefy has two simple checks.
- Before Whisper, the clip must be long enough and loud enough.
- Before cleanup, the text must be worth cleaning.
- After cleanup, the new text must stay close to the old text.
The pre-Whisper gate is plain:
- minimum duration:
1.0s - minimum voiced content:
0.25s - minimum peak level:
max(40.0, noise_floor * 1.2)
If a clip fails those checks, Whispefy skips Whisper and calls it too short or too quiet.
The cleanup text filter is simple too. If the transcript is empty, too short, or looks like junk, Whispefy keeps it as-is and skips the cleanup model.
The cleanup gate uses local embeddings from BAAI/bge-small-en-v1.5.
Whispefy compares the old text with the new text and checks cosine similarity.
The cutoff is 0.8.
If the score is below 0.8, Whispefy keeps the original text.
Config
The app reads runtime settings from .env through whispefy/config.py.
Important values:
HTTP_PORTcontrols the local server portSILENCE_MScontrols how long silence ends a sessionTRANSCRIPTION_BASE_URLshould stay on Groq's OpenAI-compatible baseTRANSCRIPTION_MODELdefaults towhisper-large-v3-turboLLM_MODELcontrols the cleanup model
API Endpoints
The FastAPI server lives in whispefy/server.py and binds to 127.0.0.1.
GET /healthchecks that the local server is alivePOST /togglestarts recording when idle, or stops the current session when activePOST /stopforces the current session to stop
Use HTTP_PORT if you need to change the local port. The default is 8764.
Launch Paths
There are three practical ways to run it:
uv run whispefyfor terminal testingexec-once = sh -c '/path/to/whispefy/start.sh >> /tmp/whispefy.log 2>&1'for Hyprland autostartsystemd --userfor a background service
Wayland Notes
Whispefy is designed for Hyprland on Wayland.
The insertion path relies on wtype, and clipboard fallback relies on wl-copy.
If those tools are missing, text insertion will fail even if transcription succeeds.
Session Behavior
Whispefy does not keep mic open forever.
It starts on POST /toggle or the Hyprland bind.
Then it does this:
- take audio frames
- watch for speech
- stop when silence comes
- skip Whisper if the clip is too short or too quiet
- clean the text
- check if the new text looks too far from the old text
- only insert if the whole thing looks good
The first check saves Groq calls. The second check stops bad cleanup from going through.
Failure Modes
These are the main failure points to know about when debugging:
- A too-short or too-quiet clip is filtered before Whisper by the pre-Whisper gate.
- A wrong
TRANSCRIPTION_BASE_URLwill break Groq chat calls if it does not resolve to the OpenAI-compatible base. - Missing
wtypeorwl-copywill break insertion, even if transcription succeeds. systemd --userlaunches need the Wayland env imported, or the service may start without access to the desktop session.- The local FastAPI server binds to
127.0.0.1, so it is meant for local triggers only.
Treat these as operational guardrails, not app bugs, unless the app mishandles them.