Skip to content

Local Voice Assistant

Fully local voice assistant with GPU acceleration. No cloud services required.

[Microphone] → Whisper STT (GPU) → Ollama LLM → Piper TTS → [Speakers]

  • Whisper (faster-whisper) — Speech-to-text on RTX 2070 Super
  • Ollama — Local LLM inference
  • Piper — Text-to-speech synthesis
  • Wyoming Protocol — Service communication
  • Dell G7 7700 / Intel i7-10750H / 32GB RAM / Nvidia RTX 2070 Super (8GB VRAM) / Windows 11
  1. Double-click start-services.bat
  2. Wait for all three windows to show “Ready” (~10 seconds)
  3. Press and hold spacebar, speak your question
  4. Release spacebar when done
  5. Wait for AI response (plays automatically)

To stop: double-click stop-services.bat or close all three service windows.

ServiceEndpoint
Whisper STTtcp://127.0.0.1:10300
Piper TTStcp://127.0.0.1:10200
Ollamahttp://127.0.0.1:11434
ModelCategoryRecommended QuantizationNotes
Llama 3 7B InstructGeneral / ChatQ4_K_MBest default brain, excellent EN/ES
Mistral 7B Instruct v0.3ReasoningQ4_K_MCoherent, strong context retention
Gemma 7B InstructConversationalQ4_K_MWarm tone, good multilingual
Qwen 2 7B InstructBilingual EN+ESQ4_K_MBest for Pepa-style personality
Yi 1.5 7B ChatCreative / NarrativeQ4_K_MGood for informal dialogue
ModelVRAMNotes
Phi-3 Mini (3.8B)~4 GBFast + smart, ideal for quick assistants
Gemma 2B~3 GBLightweight CPU+GPU mix
TinyLlama 1.1B< 2 GBExperiments only
  • Default: Llama 3 7B Instruct (Q4_K_M)
  • Bilingual EN↔ES: Qwen 2 7B Instruct
  • Tight VRAM: Phi-3 Mini (3.8B)
  • Services won’t start — Verify D:\Pepa\venv exists; run ollama list; check Device Manager for GPU.
  • No audio — Must run on bare metal (not RDP); check Windows audio settings.
  • Slow responses — Check GPU usage in Task Manager; verify Whisper is using CUDA.
  • “No speech detected” — Speak closer to mic; increase hold time on spacebar.
ComponentLatency
Whisper (base, GPU)~0.5–1s
Ollama (3B model)~1–2s
Piper TTS~0.2–0.5s
Total end-to-end< 1 second