Internal

JARVOICE

The AI listens. The AI replies. By voice.

Voice as the interface

Hold a key, speak, release. The sentence leaves the thought and enters the AI's session without going through the keyboard. The reply comes back out loud, in a neutral voice or in the user's cloned voice — your choice. JarVoice turns any dialogue with a Claude instance into a spoken conversation, both ways.

No wake word, no permanent listening. Strict push-to-talk: the mic sleeps by default, only captures while the key is held. The rest of the time, absolute silence — no audio data ever leaves the machine without explicit intent.

Under the hood for the curious: Whisper large-v3 on GPU for transcription (~1s per sentence, hotwords tuned to the ecosystem's jargon), Edge Neural TTS for the synthetic voice (~300ms latency) or F5-TTS for personal cloning. Python daemon in the background, control socket to switch voices, continuous-dictation mode, injection into a named window. All local, GPU + CPU.

STT + TTS + injection

F13 Push-to-Talk

Dedicated key — hold to speak, release to send. No voice activation command.

Whisper large-v3

CUDA float16 transcription, Aion hotwords, filter for known silent hallucinations.

Edge TTS

Microsoft Neural fr-FR-HenriNeural by default — ~300ms TTFA, natural voice without local GPU.

F5-TTS — Jean's voice

Jean's cloned voice via F5-TTS — opt-in via socket tts_engine:f5.

Multi-target injection

Active window, named window (inject_bg:<title>), or direct pipe to a CC terminal.

Continuous VOX

Voice-activated mode: permanent listening by RMS threshold, auto transcription without holding a key.

Interface

JarVoice voice mode terminal
Screenshot soon
← All tools