Architecture
Whispefy is a small pipeline with one control loop:
trigger -> record -> silence stop -> transcribe -> clean -> insert
Core Pieces
whispefy/audio.py reads mic input, tracks RMS levels, and stops the session when silence lasts long enough.
whispefy/groq_pipeline.py sends the WAV file to Groq Whisper and then runs a small LangChain cleanup pass with ChatGroq.
whispefy/insertion.py types text directly with wtype, then falls back to clipboard paste if direct typing fails.
whispefy/app.py owns the session state, while whispefy/server.py exposes the local FastAPI trigger endpoints.
Config
The app reads runtime settings from .env through whispefy/config.py.
Important values:
HTTP_PORTcontrols the local server portSILENCE_MScontrols how long silence ends a sessionTRANSCRIPTION_BASE_URLshould stay on Groq's OpenAI-compatible baseTRANSCRIPTION_MODELdefaults towhisper-large-v3-turboLLM_MODELcontrols the cleanup model
API Endpoints
The FastAPI server lives in whispefy/server.py and binds to 127.0.0.1.
GET /healthchecks that the local server is alivePOST /togglestarts recording when idle, or stops the current session when activePOST /stopforces the current session to stop
Use HTTP_PORT if you need to change the local port. The default is 8764.
Session Behavior
The app does not keep a long-running audio stream open forever.
It starts recording on POST /toggle or the Hyprland bind, then:
- buffers frames locally
- watches for speech
- stops on silence
- skips transcription if no speech was detected
- logs and inserts only after the full pipeline succeeds
Expected no-speech cases are treated as a normal cancel path, not a crash.
Launch Paths
There are three practical ways to run it:
uv run whispefyfor terminal testingexec-once = sh -c '/path/to/whispefy/start.sh >> /tmp/whispefy.log 2>&1'for Hyprland autostartsystemd --userfor a background service
Wayland Notes
Whispefy is designed for Hyprland on Wayland.
The insertion path relies on wtype, and clipboard fallback relies on wl-copy.
If those tools are missing, text insertion will fail even if transcription succeeds.
Failure Modes
These are the main failure points to know about when debugging:
No speech detectedis a normal cancel path fromwhispefy/audio.py, not a crash.- A wrong
TRANSCRIPTION_BASE_URLwill break Groq chat calls if it does not resolve to the OpenAI-compatible base. - Missing
wtypeorwl-copywill break insertion, even if transcription succeeds. systemd --userlaunches need the Wayland env imported, or the service may start without access to the desktop session.- The local FastAPI server binds to
127.0.0.1, so it is meant for local triggers only.
Treat these as operational guardrails, not app bugs, unless the app mishandles them.