MP3 to word-level transcripts — diarization, search, chat, export — one app, N projects.
AionScribe handles audio corpora over the long haul — interviews, narrative voices, family archives. Whisper large-v3 produces word-level transcripts with precise timestamps; pyannote 3.1 assigns each word to a speaker. An AgglomerativeClustering layer links speaker identities across chapters, even when audio files were recorded across multiple sessions.
The architecture follows the multi-project paradigm of creative suites — a single app (FastAPI port 8120, generic HTML frontend), many projects loaded on the fly via a ?project= slug. Each project lives in its Aion folder with its config.json, its audio files, and its output data. The backend only knows paths resolved at runtime — no hard coupling.
The interface exposes synchronized transcription-audio playback, full-text search, manual word and speaker-name correction, ZIP export (text + audio trimmed to the segment), and a chatbot grounded on the transcripts via Claude Sonnet with prompt caching.
CUDA float16 transcription — every word timestamped, aligned via wav2vec2 FR for maximum precision.
Automatic speaker identification (pyannote 3.1), linked across files via cosine clustering.
One backend, N projects — loaded on demand via ?project=slug. Zero restart between projects.
Claude Sonnet chatbot anchored on the project's transcripts — ephemeral prompt caching for each session.
ffmpeg trims audio to the selected segment + text transcription + Haiku summary in a single ZIP.
Unanswered question in the project? Direct escalation to a Hermes CC or Notion, reply via SSE.