Airakeet — Open-Source Local-First Dictation

Why it matters

Designed around the 8GB MacBook Air constraint.

Most dictation apps idle at 2–3GB. Airakeet uses aggressive unloads, streaming buffers, and CoreML tuning to stay invisible until you need it.

Zero-overhead idle

The 800MB Parakeet model is evicted after five minutes of inactivity, returning memory to the OS automatically.

ANE-first execution

Inference runs exclusively on the Apple Neural Engine, keeping the CPU free and the laptop fanless during long sessions.

Waveform overlay

A translucent HUD follows your cursor with a live waveform. It's fully color-customizable, echoing Superwhisper's UX.

Clipboard injection

Dictation drops straight into the active text field using Clipboard + CMD+V—no extra permissions, no network.

Configurable hotkeys

Supports standard shortcuts, Fn combos, and a dedicated Shift+Fn gesture for quick starts even on compact keyboards.

Audio cache & debug

Replay exactly what the engine heard via a safety cache so you can validate inputs without uploading sensitive data.

Engineering story

Systems thinking for a pure native build.

From CoreML conversion to macOS UX polish, Airakeet is a full-stack native build that spotlights low-level craftsmanship.

🧠

Model distillation

Converted NVIDIA Parakeet TDT 0.6B to CoreML with quantization and ANE-friendly ops.
🛡

Security-first footprint

Menubar-only surface, no analytics, and scoped macOS permissions keep the threat model tiny.
🎛

Hotkey architecture

Custom event tap avoids global listeners to reduce CPU wakeups while preserving instant response.
🧊

Memory choreography

Extract-and-clear buffers plus timed auto-unload keep RAM usage flat during long recordings.

Results

Feels like hardware, not software.

Transcribes five seconds of speech in 0.11s, injects text instantly, and stays invisible until summoned. Perfect companion for essays, code reviews, or meeting notes.

Swift + AppKit CoreML Metal Performance Shaders AudioKit

Under the hood

Built on NVIDIA Parakeet, tuned for everyday workflows.

Parakeet is NVIDIA’s speech model family converted to CoreML. Think of it as a musician trained on billions of sentences who performs directly on your Mac instead of on a cloud stage.

What Parakeet brings

Accent friendly. Trained on noisy, multi-accent corpora so code-switching and filler words aren’t dropped.
ANE-ready tensors. Converted into CoreML operators that map cleanly to the Apple Neural Engine.
Streaming aware. Supports chunk-by-chunk inference without waiting for full clips.

Future engines

The 1.1B Parakeet-EOU build will unlock live dictation with punctuation, multilingual translation, and smarter “keep listening” behavior without adding cloud latency.

Parakeet TDT 0.6B Parakeet EOU 1.1B NVIDIA Canary

Parakeet vs Local Whisper (both running on-device)

Attribute	Parakeet	Local Whisper
Latency target	Optimized for low-latency partials so you see text mid-sentence.	Batch-first decoding introduces a pause before the first characters appear.
Hardware sweet spot	Runs comfortably on the Apple Neural Engine with 8GB RAM.	Prefers discrete GPU or 16GB+ unified memory to stay smooth.
Streaming feel	Designed for incremental injection with ANE offload.	Often buffers a full sentence before emitting, so text arrives in bursts.

Forward-looking

Dual-engine roadmap for future Apple Silicon.

High-tier work resumes when I upgrade to a 32GB MacBook Air so I can validate the 1.1B model end-to-end.

Phase 1 · Engine abstraction

Refactor `ASREngine` to load either 0.6B or 1.1B models on demand and prevent RAM collisions.

Phase 2 · Live injection

Streaming text, word-by-word insertion, and silence detection using Parakeet-EOU for instant feedback.

Phase 3 · Hardware validation

Benchmark M5 hardware, monitor thermals, and explore NVIDIA Canary for multilingual translation.

Future: Streaming engine

Queued for my next hardware upgrade.

Rolls out once I’m on a new 32GB MacBook Air so the 1.1B model fits comfortably—nothing required on your end.

With this update you’ll be able to choose between today’s ultra-efficient 0.6B engine and a higher-capacity Parakeet EOU 1.1B build. That bigger model will enable true word-by-word streaming and EOU (End of Utterance) timing so Airakeet feels like it’s reading your mind.

Words appear as you speak: Streaming injection means paragraphs grow in the active text field without waiting for the clip to finish.
EOU = natural pauses: The engine listens for the tiny silence after each thought, then auto-stops recording with punctuation so you never overshoot a sentence.
Instant re-entry: If you keep talking, the future engine jumps back into capture without reloading gigabytes of weights.

The uncompromising transcription tool for base-model Apple Silicon.

Reactive Waveform