Voice Interface Architecture
Pipeline Flow
Section titled “Pipeline Flow”User → Microphone → Wake Word Detection → Language Context → ASR → NLP / Intent Parser → Automation Layer → TTS / Feedback → User
Step-by-Step
Section titled “Step-by-Step”- Wake Word Detection — Listens for “Pepa, English” →
mode = ENor “Pepa, Español” →mode = ES. Can auto-detect, but explicit phrases reduce errors. - Language Context — Stores session variable
language_mode = EN/ES, used by all downstream modules. - ASR — Switches model based on
language_mode: English Whisper model or Spanish/bilingual Whisper model. - NLP / Intent Parsing — Converts speech to actionable intent. Maps multiple phrasings to the same entity: e.g. “Turn on the hall light” ↔ “Enciende la luz del pasillo”.
- Automation Layer — Triggers Home Assistant actions. Device IDs remain language-neutral (e.g.
luz_pasillo). - TTS / Feedback — Spoken response in current language mode via Piper. Optional visual display on NSPanel or Magic Mirror.
- Session / Timeout — Language mode persists until timeout (60s), explicit switch, or reset command.
Optional Extensions
Section titled “Optional Extensions”- Guest mode: “Pepa, Guest” → English-only, simplified commands, confirmation prompts.
- Care mode: “Pepa, Care” → slower speech, extra confirmations.
- Speaker-aware: ASR + speaker ID remembers preferred language per person.