Aionverse — AionScribe

Industrial-grade transcription, project architecture

AionScribe handles audio corpora over the long haul — interviews, narrative voices, family archives. Whisper large-v3 produces word-level transcripts with precise timestamps; pyannote 3.1 assigns each word to a speaker. An AgglomerativeClustering layer links speaker identities across chapters, even when audio files were recorded across multiple sessions.

The architecture follows the multi-project paradigm of creative suites — a single app (FastAPI port 8120, generic HTML frontend), many projects loaded on the fly via a ?project= slug. Each project lives in its Aion folder with its config.json, its audio files, and its output data. The backend only knows paths resolved at runtime — no hard coupling.

The interface exposes synchronized transcription-audio playback, full-text search, manual word and speaker-name correction, ZIP export (text + audio trimmed to the segment), and a chatbot grounded on the transcripts via Claude Sonnet with prompt caching.

Full pipeline

Whisper large-v3 ASR

CUDA float16 transcription — every word timestamped, aligned via wav2vec2 FR for maximum precision.

pyannote diarization

Automatic speaker identification (pyannote 3.1), linked across files via cosine clustering.

Multi-project lazy-load

One backend, N projects — loaded on demand via ?project=slug. Zero restart between projects.

Grounded chat

Claude Sonnet chatbot anchored on the project's transcripts — ephemeral prompt caching for each session.

ZIP export

ffmpeg trims audio to the selected segment + text transcription + Haiku summary in a single ZIP.

Hermes escalation

Unanswered question in the project? Direct escalation to a Hermes CC or Notion, reply via SSE.

AIONSCRIBE