The AI listens. The AI replies. By voice.
Hold a key, speak, release. The sentence leaves the thought and enters the AI's session without going through the keyboard. The reply comes back out loud, in a neutral voice or in the user's cloned voice — your choice. JarVoice turns any dialogue with a Claude instance into a spoken conversation, both ways.
No wake word, no permanent listening. Strict push-to-talk: the mic sleeps by default, only captures while the key is held. The rest of the time, absolute silence — no audio data ever leaves the machine without explicit intent.
Under the hood for the curious: Whisper large-v3 on GPU for transcription (~1s per sentence, hotwords tuned to the ecosystem's jargon), Edge Neural TTS for the synthetic voice (~300ms latency) or F5-TTS for personal cloning. Python daemon in the background, control socket to switch voices, continuous-dictation mode, injection into a named window. All local, GPU + CPU.
Dedicated key — hold to speak, release to send. No voice activation command.
CUDA float16 transcription, Aion hotwords, filter for known silent hallucinations.
Microsoft Neural fr-FR-HenriNeural by default — ~300ms TTFA, natural voice without local GPU.
Jean's cloned voice via F5-TTS — opt-in via socket tts_engine:f5.
Active window, named window (inject_bg:<title>), or direct pipe to a CC terminal.
Voice-activated mode: permanent listening by RMS threshold, auto transcription without holding a key.