Skip to content

Product Roadmap

Last Updated: 2026-02-20

VaulType — Privacy-first, macOS-native speech-to-text with local LLM post-processing. This roadmap defines the phased development plan from MVP through stable release and beyond.

PhaseVersionStatusTasks
Phase 0FoundationCompleteAll tasks done
Phase 1v0.1.0 (MVP)Complete35/35 tasks done
Phase 2v0.2.0 (LLM)Complete33/33 tasks done
Phase 3v0.3.0 (Smart)Complete29/29 tasks done
Phase 4v0.4.0 (Voice Commands)Complete23/23 tasks done
Phase 5v0.5.0 (Power User & Polish)Complete25/25 tasks done
Phase 6v1.0.0 (Stable Release)In ProgressSeveral tasks remaining

Completed:

  • Developer ID code signing configured
  • Notarization integrated (scripts/notarize.sh)
  • DMG packaging created (scripts/create-dmg.sh)
  • Sparkle 2.x auto-updates integrated
  • GitHub Actions: build workflow (build.yml), test workflow (test.yml), lint workflow (lint.yml)

Remaining:

  • GitHub Actions release workflow (sign + notarize + DMG + GitHub Release in CI)
  • Homebrew cask submission
  • Full regression testing (Phase 1-5 end-to-end)
  • Memory leak and stability testing
  • Privacy verification (zero outbound network during core operations)
  • Accessibility compliance audit (WCAG 2.1 AA)
  • Documentation update pass


VaulType follows a phased release strategy. Each phase builds incrementally on the previous one, delivering usable value at every milestone. All phases share one non-negotiable constraint: every feature operates 100% locally on the user’s device.

v0.1.0 (MVP) v0.2.0 (LLM) v0.3.0 (Smart)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Menu bar app │ │ llama.cpp │ │ App-aware │
│ Global hotkey│ │ 6 processing │ │ Dictation │
│ Audio capture│──────▶│ modes │──────▶│ history │
│ whisper.cpp │ │ Prompt │ │ Overlay │
│ Text inject │ │ templates │ │ Vocabulary │
│ Settings │ │ Model DL │ │ Multi-lang │
└──────────────┘ └──────────────┘ └──────────────┘
v1.0 (Stable) v0.5.0 (Polish) v0.4.0 (Voice)
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Feature- │ │ Shortcuts │ │ System cmds │
│ complete │ │ Audio feed- │ │ App mgmt │
│ All phases │◀──────│ back │◀──────│ Window mgmt │
│ delivered │ │ Accessibility│ │ Sys controls │
│ Stable API │ │ Performance │ │ Workflow │
│ │ │ Plugin sys │ │ automation │
└──────────────┘ └──────────────┘ └──────────────┘
PhaseVersionThemeStatusKey Deliverables
Phase 1v0.1.0MVP — Core DictationCompleteMenu bar app, hotkey, whisper.cpp, text injection
Phase 2v0.2.0LLM Post-ProcessingCompletellama.cpp, 6 modes, prompt templates, model downloader
Phase 3v0.3.0Smart FeaturesCompleteApp-aware context, history, overlay, vocabulary, multi-lang
Phase 4v0.4.0Voice CommandsCompleteSystem commands, app/window management, automation
Phase 5v0.5.0Power User & PolishCompleteVoice-chained shortcuts, feedback, performance, plugins
Phase 6v1.0.0Stable ReleaseIn ProgressCode signing, notarization, DMG, CI/CD, testing, docs

Note: Versions beyond v1.0 are exploratory and subject to community input and technical feasibility assessment.


Goal: Deliver a functional, privacy-first dictation app that allows a user to speak into any macOS application with under 2 seconds of end-to-end latency. No network required. No LLM required.

A persistent, lightweight menu bar application built with SwiftUI MenuBarExtra.

DeliverableDescriptionPriority
Menu bar iconDisplays mic state (idle, recording, processing) via SF SymbolsP0
Menu bar popoverShows current status, last transcription preview, quick settingsP0
Settings windowSettings scene with tabbed preference panesP0
Launch at loginSMAppService.mainApp.register() integrationP1
Dock icon toggleOption to hide/show dock icon (LSUIElement)P2
// Target architecture
@main
struct VaulTypeApp: App {
@StateObject private var appState = AppState()
var body: some Scene {
MenuBarExtra("VaulType", systemImage: appState.menuBarIcon) {
MenuBarView()
.environmentObject(appState)
}
.menuBarExtraStyle(.window)
Settings {
SettingsView()
.environmentObject(appState)
}
}
}

System-wide keyboard shortcut that works regardless of which application is focused.

DeliverableDescriptionPriority
Default hotkeyCmd+Shift+Space for push-to-talk (hold) and toggle (press)P0
Hotkey customizationUser-configurable shortcut via SettingsP0
Conflict detectionWarn when the chosen shortcut conflicts with another appP1
Multiple hotkeysRegister up to 4 hotkeys (for different modes in Phase 2)P2

Note: The global hotkey system uses CGEvent tap or the KeyboardShortcuts library. The chosen approach must work without the App Sandbox. See Tech Stack for rationale.

Real-time audio capture from the user’s microphone, converted to 16kHz mono Float32 PCM for whisper.cpp.

DeliverableDescriptionPriority
AVAudioEngine tapCapture audio from the default input deviceP0
Format conversionResample any input to 16kHz mono Float32P0
Device selectionAllow the user to choose a specific microphoneP1
Voice Activity Detection (VAD)Basic energy-based VAD to trim silenceP1
Circular bufferRolling 30-second buffer for streaming supportP1
Microphone --> AVAudioEngine --> Format Converter --> Circular Buffer
(any Hz) (tap) (16kHz mono F32) (30s rolling)
|
v
whisper.cpp

1.4 Local Speech Recognition (whisper.cpp)

Section titled “1.4 Local Speech Recognition (whisper.cpp)”

On-device speech-to-text using whisper.cpp with Metal GPU acceleration.

DeliverableDescriptionPriority
whisper.cpp integrationCompile and link whisper.cpp via CMake + bridging headerP0
Metal GPU accelerationEnable Metal backend for Apple Silicon and AMD GPUsP0
Multiple model sizesSupport tiny, base, small, medium, large-v3P0
Model selection UISettings pane to choose the active whisper modelP0
Streaming transcriptionProcess audio in chunks for near-real-time outputP1
Language selectionExplicit language or auto-detectP1

Supported model matrix:

ModelParametersDisk SizeRAM (approx)Speed (M1, 10s audio)Quality
whisper-tiny39M~75 MB~200 MB~0.3sBasic
whisper-base74M~142 MB~350 MB~0.5sGood
whisper-small244M~466 MB~750 MB~1.0sVery Good
whisper-medium769M~1.5 GB~2.0 GB~2.5sExcellent
whisper-large-v31550M~3.1 GB~4.0 GB~5.0sBest

Note: The default model for MVP is whisper-base — a good balance of speed and accuracy that loads quickly and runs well even on 8 GB Macs.

Deliver transcribed text to the cursor position in any macOS application.

DeliverableDescriptionPriority
CGEvent injectionSimulate keystrokes for short text (<50 chars)P0
Clipboard paste fallbackCmd+V paste with clipboard preserve/restore for long textP0
Auto-detect strategyChoose CGEvent vs. clipboard based on text lengthP0
Unicode supportFull Unicode including CJK, emoji, diacriticsP1
Accessibility permission checkGuide user through granting Accessibility permissionP0

Important: Text injection requires the Accessibility permission. VaulType must detect when permission is missing and present clear instructions. See Security for the permission model.

Persistent user preferences stored via SwiftData and UserDefaults.

DeliverableDescriptionPriority
General tabHotkey, launch at login, dock iconP0
Audio tabInput device selection, VAD sensitivityP0
Models tabWhisper model selection, download managementP0
About tabVersion, open source attribution, linksP1
Data storageSwiftData for structured data, UserDefaults for lightweight stateP0

Note: See Database Schema for the complete data model specification.

The MVP is complete when a user can:

  1. Install VaulType by dragging to /Applications
  2. Grant Microphone and Accessibility permissions on first launch
  3. Press Cmd+Shift+Space, speak naturally, and release
  4. See the transcribed text appear at the cursor in any app
  5. Complete the full cycle (press, speak, release, text appears) in under 2 seconds for a short sentence

Goal: Add local LLM-powered text transformation. Raw whisper output becomes polished, formatted text — punctuated, structured, or completely rewritten according to user-defined modes and templates.

Direct in-process LLM inference via llama.cpp compiled into the app binary.

DeliverableDescriptionPriority
llama.cpp integrationCompile and link llama.cpp via CMake + bridging headerP0
Metal GPU accelerationFull Metal offloading for Apple SiliconP0
GGUF model loadingLoad and manage GGUF-format modelsP0
Context managementConfigurable context length (512-4096 tokens)P1
Memory managementGraceful handling when model exceeds available RAMP0
LLMProvider protocolAbstract interface for swappable backendsP0
/// Protocol abstracting LLM backends for extensibility
protocol LLMProvider: Sendable {
func loadModel(at path: URL, parameters: LLMLoadParameters) async throws
func complete(prompt: String, parameters: LLMInferenceParameters) async throws -> String
var isModelLoaded: Bool { get }
var estimatedMemoryUsage: UInt64 { get }
func unloadModel() async
}

For users who already have Ollama installed and prefer using their existing models.

DeliverableDescriptionPriority
Ollama detectionAuto-detect running Ollama instance on localhost:11434P1
OllamaProviderLLMProvider implementation using Ollama HTTP APIP1
Model listingFetch available models from the Ollama APIP1
Settings toggleAllow switching between llama.cpp and Ollama in preferencesP1

Note: Ollama is strictly optional. VaulType must function fully without it. Network calls are made only to localhost. See Security for network policy.

Built-in model management with download, verification, and storage.

DeliverableDescriptionPriority
Model registryPre-configured list of recommended GGUF models with metadataP0
Download managerBackground downloads with progress tracking via URLSessionP0
SHA-256 verificationVerify integrity of downloaded model filesP0
Storage managementShow disk usage per model, allow deletionP0
Custom model importImport user-supplied GGUF files from FinderP1

Recommended LLM models for post-processing:

ModelParametersQuantizationDisk SizeRAM (approx)Use Case
Qwen2.5-0.5B0.5BQ4_K_M~350 MB~600 MBLightweight cleanup, 8 GB systems
Qwen2.5-1.5B1.5BQ4_K_M~900 MB~1.2 GBBalanced cleanup and formatting
Qwen2.5-3B3BQ4_K_M~1.8 GB~2.5 GBHigh-quality formatting, 16 GB systems
Llama-3.2-1B1BQ4_K_M~700 MB~1.0 GBFast general-purpose processing
Llama-3.2-3B3BQ4_K_M~1.9 GB~2.6 GBHigh-quality general-purpose
Phi-3-mini-4k3.8BQ4_K_M~2.2 GB~3.0 GBBest instruction following

Six distinct processing modes that define how whisper.cpp output is transformed by the LLM.

ModeEnum ValueLLM RequiredDescription
Raw.rawNoUnprocessed whisper output — exactly what was transcribed
Clean.cleanYesFix punctuation, capitalization, remove filler words (“um”, “uh”, “like”)
Structure.structureYesOrganize into paragraphs, bullet lists, or headings based on content
Prompt.promptYesApply a user-defined LLM prompt template (e.g., “rewrite as email”)
Code.codeYesConvert spoken programming instructions into valid source code
Custom.customYesFully user-defined pipeline with custom pre/post processors
whisper.cpp output ──► Mode Router ──┬── Raw ──────────► direct output
├── Clean ────────► LLM (cleanup prompt)
├── Structure ────► LLM (structure prompt)
├── Prompt ───────► LLM (user template)
├── Code ─────────► LLM (code prompt)
└── Custom ───────► LLM (custom pipeline)

Each registered hotkey can trigger a different processing mode.

DeliverableDescriptionPriority
Hotkey-mode bindingAssociate each of 4 hotkeys with a specific modeP0
Mode indicatorShow the active mode in the menu bar popoverP0
Quick switchCycle through modes via a secondary shortcutP1
Default modeConfigurable default when no hotkey-specific mode is setP0

Example configuration:

HotkeyModeUse Case
Cmd+Shift+SpaceCleanGeneral dictation — punctuated, capitalized
Cmd+Shift+CCodeDictating source code in an IDE
Cmd+Shift+EPrompt (Email)Drafting emails from spoken thoughts
Cmd+Shift+RRawVerbatim capture for meeting notes

Change the processing mode by speaking a trigger phrase at the start of dictation.

DeliverableDescriptionPriority
Prefix detectionDetect mode-switching phrases in the first 2 seconds of audioP1
Built-in prefixes”Code mode”, “Email mode”, “Clean this up”, “Raw mode”P1
Custom prefixesUser-defined trigger phrases mapped to modesP2
Prefix strippingRemove the trigger phrase from the final outputP1

Example workflow:

User says: "Code mode... function hello world that returns a string"
Detected: prefix = "Code mode" --> switch to .code
Processed: "Code mode" stripped, remaining text processed as code
Output: func helloWorld() -> String {

A templating engine for LLM prompts with variable substitution.

DeliverableDescriptionPriority
Template data modelPromptTemplate SwiftData model with system/user promptsP0
Variable substitution{{transcription}}, {{language}}, {{tone}}, custom varsP0
Built-in templates4 shipped templates: Clean, Structured Notes, Code, EmailP0
Template editorSettings UI for creating and editing templatesP0
Template import/exportShare templates as JSON filesP2

Built-in template variables:

VariableTypeDescription
{{transcription}}Built-inRaw whisper.cpp output text (always available)
{{language}}Built-inDetected or selected language code (e.g., “en”, “tr”)
{{app_name}}Built-inName of the currently focused application
{{app_bundle_id}}Built-inBundle identifier of the focused application
{{timestamp}}Built-inCurrent date/time in ISO 8601 format
{{tone}}User-definedCustom variable (e.g., “professional”, “casual”)
{{recipient}}User-definedCustom variable for email templates

Note: See Database Schema for the PromptTemplate model definition and built-in template seed data.


Goal: Make VaulType context-aware and history-capable. The app adapts its behavior based on the active application, remembers past dictations, and provides a visual overlay for editing before injection.

Automatically select the optimal processing mode and vocabulary based on which application is focused.

DeliverableDescriptionPriority
Frontmost app detectionMonitor NSWorkspace for active app changesP0
App profilesPer-app configuration (mode, language, injection method, vocabulary)P0
Auto-profile creationCreate a default profile the first time the user dictates into an appP1
Smart defaultsSensible defaults (e.g., Code mode for Xcode, Clean for Mail)P1
Profile editorSettings UI for managing per-app profilesP0

Example auto-detection rules:

ApplicationBundle IDDefault ModeDefault LanguageInjection Method
Xcodecom.apple.dt.XcodeCodeenCGEvent
Mailcom.apple.mailCleanAuto-detectCGEvent
Slackcom.tinyspeck.slackmacgapCleanAuto-detectClipboard
Terminalcom.apple.TerminalRawenCGEvent
VS Codecom.microsoft.VSCodeCodeenClipboard
Notescom.apple.NotesStructureAuto-detectCGEvent

Searchable, editable log of all past transcriptions.

DeliverableDescriptionPriority
History storageDictationEntry records in SwiftDataP0
History windowDedicated window with search, filter, and sortP0
Full-text searchSearch across raw and processed textP0
Filter by appFilter history by target applicationP1
Filter by dateDate range picker for history filteringP1
Edit and re-injectEdit a past transcription and inject it at the current cursorP1
FavoritesMark entries as favorites (excluded from auto-deletion)P1
Retention policiesConfigurable auto-deletion by age and countP0
ExportExport history as JSON or plain textP2

Note: Dictation history stores text only — never audio. Audio is processed in memory and discarded immediately. See Security for audio data lifecycle details.

A small, always-on-top overlay that shows real-time transcription and allows editing before injection.

DeliverableDescriptionPriority
Overlay windowFloating NSPanel with real-time transcription textP1
Edit before injectUser can modify transcribed text before it is injectedP1
Dismiss and injectPress Enter or click “Inject” to send text to cursorP1
CancelPress Escape to discard without injectingP1
Position controlConfigurable overlay position (near cursor, corner, center)P2
TransparencyAdjustable opacity so the overlay does not obscure workP2
┌─────────────────────────────────────────────┐
│ VaulType [x] [-] │
├─────────────────────────────────────────────┤
│ │
│ The quick brown fox jumps over the lazy │
│ dog. │
│ _ │
│ │
├─────────────────────────────────────────────┤
│ Mode: Clean | Lang: en | 0.8s │
│ [Cancel] [Inject] │
└─────────────────────────────────────────────┘

User dictionary for correcting common whisper misrecognitions and domain-specific terms.

DeliverableDescriptionPriority
Global vocabularyReplacement rules that apply in all applicationsP0
Per-app vocabularyReplacements scoped to specific app profilesP1
Case sensitivityOption for case-sensitive or case-insensitive matchingP1
Vocabulary editorSettings UI for managing spoken form / replacement pairsP0
Auto-correctionApply vocabulary replacements before LLM processingP0
Import/exportShare vocabulary files as JSONP2

Example entries:

Spoken FormReplacementScopeNotes
”ecks code""Xcode”GlobalCommon whisper misrecognition
”jay son""JSON”GlobalAcronym normalization
”swift you eye""SwiftUI”GlobalFramework name
”build and run""Cmd+R”Xcode onlyApp-specific shortcut
”hush type""VaulType”GlobalProduct name

Support for multiple transcription languages with quick switching and auto-detection.

DeliverableDescriptionPriority
Language selectionExplicit language setting in preferences and per-app profilesP0
Auto-detectwhisper.cpp automatic language detection (first 30 seconds)P0
Quick switchKeyboard shortcut or voice command to change language mid-sessionP1
Language indicatorShow detected/selected language in the menu bar and overlayP1
Per-app languageOverride global language for specific applicationsP1
LLM language awarenessPass detected language to LLM prompt templatesP1

Note: whisper.cpp supports 99 languages. Initial focus is on the top 10 languages by user demand. Language auto-detection adds ~0.5s of latency as whisper.cpp analyzes the first 30 seconds of audio.


Goal: Extend beyond dictation into voice-driven system control. Users can launch apps, manage windows, adjust system settings, and chain commands — all by speaking.

An interpreter that distinguishes between dictation and commands.

DeliverableDescriptionPriority
Command detectionIdentify voice input as a command vs. dictation textP0
Command prefixConfigurable wake phrase (e.g., “Hey Type” or “Computer”)P0
Command parserParse natural language into structured command actionsP0
Confirmation modeOptional confirmation before executing destructive commandsP1
Command feedbackAudio or visual confirmation of executed commandsP0
Error handlingGraceful failure with helpful error messagesP0
Voice Input ──► Command Detector ──┬── Dictation ──► Normal pipeline
└── Command ──► Command Parser ──► Executor
┌────┴─────┐
│ Validated │
│ Command │
│ Object │
└────┬──────┘
┌─────────┼──────────┐
▼ ▼ ▼
App Mgmt Window Mgmt System

Voice commands for launching, switching, and closing applications.

Command PatternActionExample
”Open {app}“Launch application by name”Open Safari"
"Switch to {app}“Bring application to foreground”Switch to Xcode"
"Close {app}“Close the frontmost window of an application”Close Finder"
"Quit {app}“Terminate an application”Quit Preview"
"Hide {app}“Hide an application”Hide Messages"
"Show all windows”Invoke Mission Control”Show all windows”

Note: App management commands use NSWorkspace and NSRunningApplication APIs. Destructive commands (quit, close) require confirmation mode to be disabled or the user to confirm.

Voice commands for positioning and resizing windows.

Command PatternActionExample
”Move window left”Tile current window to the left half”Move window left"
"Move window right”Tile current window to the right half”Move window right"
"Maximize window”Fill the screen”Maximize window"
"Minimize window”Minimize to Dock”Minimize window"
"Full screen”Enter macOS full-screen mode”Full screen"
"Center window”Center the window on screen”Center window"
"Next screen”Move window to the next display”Next screen”

Voice commands for adjusting system settings.

Command PatternActionExample
”Volume up/down”Adjust system volume by 10% steps”Volume up"
"Volume {number}“Set volume to a specific level”Volume fifty percent"
"Mute / Unmute”Toggle system mute”Mute"
"Brightness up/down”Adjust display brightness”Brightness down"
"Do not disturb on/off”Toggle Focus mode”Do not disturb on"
"Dark mode / Light mode”Switch appearance”Dark mode"
"Lock screen”Lock the Mac”Lock screen"
"Screenshot”Capture screen”Screenshot”

Chain multiple commands and integrate with macOS automation frameworks.

DeliverableDescriptionPriority
Command chainingExecute multiple commands in sequence via voiceP1
Apple ShortcutsTrigger Shortcuts app workflows by nameP1
AppleScript executionRun AppleScript snippets via voice (with safeguards)P2
Custom command definitionsUsers define named commands mapped to action sequencesP2
Command historyLog of recently executed commands for repeatP2

Example command chain:

"Open Safari, go to GitHub, and switch to dark mode"
──► [1] Open Safari
──► [2] Wait for Safari to activate
──► [3] Navigate to github.com (inject URL + Enter)
──► [4] Switch system to dark mode

Important: AppleScript execution requires the Automation permission and presents significant security considerations. Users must explicitly grant per-app Automation permissions, and VaulType must sanitize all input to prevent injection attacks. See Security for details.


Phase 5 — Power User and Polish (v0.5.0)

Section titled “Phase 5 — Power User and Polish (v0.5.0)”

Goal: Refine the experience for daily-driver use. Optimize performance, add audio feedback, ensure accessibility, and introduce a plugin system for community extensibility.

Dictate keyboard shortcuts by name or description.

DeliverableDescriptionPriority
Shortcut dictationSay “Command+Shift+N” or “New folder” and inject the keystrokeP1
App-aware shortcutsKnow that “Build and run” means Cmd+R in XcodeP1
Shortcut aliasesUser-defined aliases (e.g., “Save all” = Cmd+Option+S)P2
Combo executionInject modifier key combinations via CGEventP1

Audible cues for state transitions and command execution.

DeliverableDescriptionPriority
Recording start/stop soundsDistinct tones for begin and end of recordingP0
Success/error soundsAudio confirmation for commandsP1
Sound pack systemMultiple sound themes (subtle, mechanical, none)P2
Volume controlIndependent volume for feedback soundsP1
System sound integrationUse NSSound or AudioServicesPlaySystemSoundP0

Ensure VaulType is fully usable by people with disabilities.

DeliverableDescriptionPriority
VoiceOver supportFull VoiceOver compatibility for all UI elementsP0
Accessibility labelsMeaningful labels on all interactive elementsP0
State announcementsAnnounce recording/processing state changes to assistive techP0
High contrast supportRespect macOS “Increase Contrast” settingP1
Reduced motionRespect macOS “Reduce Motion” preferenceP1
Dynamic TypeScale text with system font size preferencesP1
Keyboard navigationFull keyboard navigation for all UIP0

Note: Accessibility compliance goals are documented in detail in Legal Compliance.

Ensure VaulType is a responsible background citizen — low resource usage, battery awareness, and thermal management.

DeliverableDescriptionPriority
Model preloadingKeep active models in memory between transcriptionsP0
Lazy model loadingLoad models on first use, not at app launchP0
Battery-aware modeReduce model quality/threads when on battery powerP1
Thermal managementThrottle inference when the system is thermally constrainedP1
Memory pressure responseUnload LLM model under memory pressure, keep whisper loadedP1
Idle memory reductionRelease unused memory after configurable idle periodP2
Startup optimizationTarget <0.5s launch to menu bar readinessP0
Background CPU usageNear-zero CPU when idle (no polling, event-driven only)P0

Battery-aware strategy:

Power StateWhisper ModelLLM ModelGPU LayersThreads
Plugged inUser’s choiceUser’s choiceMaximumAuto (all cores)
Battery > 50%User’s choiceUser’s choiceMaximumAuto (P-cores only)
Battery 20-50%Downgrade 1 tierDowngrade 1 tierReduced by 50%4 threads max
Battery < 20%Tiny or Base onlyDisabledMinimum2 threads max
Low Power ModeTiny onlyDisabledMinimum2 threads max

A mechanism for the community to extend VaulType with custom processing, commands, and integrations.

DeliverableDescriptionPriority
Plugin APISwift protocol-based plugin interfaceP2
Plugin typesProcessing plugins, command plugins, integration pluginsP2
Plugin discoveryLoad plugins from ~/Library/Application Support/VaulType/Plugins/P2
Sandboxed executionPlugins run in restricted context with limited system accessP2
Plugin manager UIInstall, enable/disable, and remove plugins from SettingsP2
DocumentationPlugin development guide with example pluginsP2
/// Plugin protocol for community extensions
protocol VaulTypePlugin: Sendable {
/// Unique identifier for this plugin
var identifier: String { get }
/// Human-readable name
var displayName: String { get }
/// Plugin version
var version: String { get }
/// Called when the plugin is loaded
func activate() async throws
/// Called when the plugin is unloaded
func deactivate() async
}
/// Processing plugin -- transforms text in the pipeline
protocol ProcessingPlugin: VaulTypePlugin {
func process(text: String, context: ProcessingContext) async throws -> String
}
/// Command plugin -- adds new voice commands
protocol CommandPlugin: VaulTypePlugin {
var supportedCommands: [CommandDefinition] { get }
func execute(command: ParsedCommand) async throws -> CommandResult
}

Important: The plugin system is a Phase 5 deliverable and will be designed after the core product stabilizes. The API surface will be kept deliberately narrow to ensure stability and security. Plugins must not access the microphone, network, or filesystem outside their designated sandbox.


Goal: All five phases delivered, tested, documented, and polished to production quality. v1.0 represents VaulType’s public commitment to API and feature stability.

CategoryRequirementStatus
FeaturesAll Phase 1-5 features shipped and functionalDone (Phase 5 complete)
StabilityZero known crash bugs; <0.1% crash rate in beta testingPending (regression testing)
PerformanceEnd-to-end latency <2s (whisper-base); <4s with LLM post-processingDone (PowerManagementService, VAD, pipeline optimized)
PrivacyZero network calls for core functionalityPending (formal verification)
SecurityCode signing (Developer ID), notarizationDone
AccessibilityWCAG 2.1 AA compliance; VoiceOver supportedPending (audit)
DocumentationUser guide, developer docs, plugin guideIn Progress
DistributionNotarized DMG, Sparkle auto-updatesDone; Homebrew cask pending
CI/CDBuild, test, lint workflowsDone; release workflow pending
TestingUnit tests for all services and commandsDone (Phase 4-5 tests complete)

After v1.0, VaulType follows Semantic Versioning:

Version ComponentMeaningExample
Major (X.0.0)Breaking changes to plugin API or data format2.0.0
Minor (1.X.0)New features, backward-compatible1.1.0, 1.2.0
Patch (1.0.X)Bug fixes, security patches1.0.1, 1.0.2

The following features are under consideration for post-1.0 releases. They are not committed to any timeline and will be prioritized based on community feedback, technical feasibility, and alignment with VaulType’s privacy-first principles.

Distinguish between multiple speakers in the audio stream.

AspectDetails
Use CaseMeeting transcription, interview recording, podcast notes
Technical ApproachSpeaker embedding models (e.g., pyannote-style) running locally
ChallengesSignificant additional compute; requires speaker enrollment or clustering
PrivacySpeaker embeddings are biometric data — must be handled with extra care
DependencyRequires whisper.cpp or a companion library to support diarization

Translate transcribed speech from one language to another before injection.

AspectDetails
Use CaseBilingual workflows, cross-language communication
Technical ApproachLocal translation model (e.g., NLLB, Opus-MT) via llama.cpp or dedicated engine
ChallengesTranslation quality; additional model download; increased latency
PrivacyMust remain fully local — no cloud translation APIs
DependencyRequires suitable GGUF-format translation models

Edit already-injected text using voice commands.

AspectDetails
Use Case”Select the last sentence”, “Replace ‘foo’ with ‘bar’”, “Delete the previous word”
Technical ApproachTrack injected text positions; use Accessibility API for cursor manipulation
ChallengesRequires reliable text position tracking across diverse applications
PrivacyMay need to read text from the active app via Accessibility API
DependencyDeep integration with the Accessibility API beyond current CGEvent usage

Long-form transcription optimized for meetings, lectures, and interviews.

AspectDetails
Use CaseContinuous recording for 30-60+ minutes with structured output
Technical ApproachStreaming whisper inference with periodic LLM summarization
ChallengesMemory management for long audio; maintaining context over hours
PrivacyLong-running transcription increases the sensitivity of stored data
DependencyPhase 3 (history) and potentially speaker diarization

Connect VaulType to popular productivity tools.

IntegrationDescriptionFeasibility
RaycastVaulType as a Raycast extension for quick dictationHigh — Raycast has a Swift extension API
AlfredAlfred workflow for triggering VaulType modesMedium — requires Alfred Powerpack
ObsidianDirect dictation into Obsidian notes with metadataMedium — via URI scheme or plugin
NotionDictation to Notion pagesLow — requires Notion API (cloud)

Note: Integrations that require cloud APIs (e.g., Notion) conflict with VaulType’s privacy-first design. Such integrations would be offered as opt-in plugins with clear privacy disclosures.

A paired iPhone/iPad app for mobile dictation.

AspectDetails
Use CaseDictate on iPhone, text appears on Mac; mobile note capture
Technical ApproachLocal network pairing via Bonjour; whisper.cpp on iOS (CoreML backend)
ChallengesiOS performance constraints; cross-device sync without cloud
PrivacyData transfer over local network only (no iCloud, no internet)
DependencyRequires significant R&D; CoreML Whisper models for iOS

Features for organizational deployment.

FeatureDescription
MDM configurationManaged preferences via macOS MDM profiles
Centralized model distributionOrganization-hosted model repository (internal HTTP server)
Usage analyticsOptional, locally-aggregated usage statistics for IT teams
Approved model listRestrict which models users can download
Group vocabularyShared vocabulary dictionaries distributed via configuration profiles

A potential sustainability model for long-term development.

TierLicenseFeatures
CommunityGPL-3.0All core features (Phases 1-5), all processing modes, plugin system
ProfessionalCommercialPriority support, pre-built model bundles, enterprise MDM profiles
EnterpriseCommercialCentralized management, custom model training pipeline, SLA

Important: The open-core model is under consideration only. The core product — everything described in Phases 1-5 — will always remain open source under GPL-3.0. See Legal Compliance for license details.


This section provides the formal acceptance criteria for the v0.1.0 MVP release.

Each criterion must be verified before the MVP can be tagged and released.

IDCriterionVerification Method
AC-01User can install by dragging VaulType.app to /ApplicationsManual test on clean macOS 14 installation
AC-02App appears in the menu bar on launch with correct iconVisual inspection
AC-03App prompts for Microphone permission on first recording attemptManual test on clean install
AC-04App prompts for Accessibility permission on first text injectionManual test on clean install
AC-05Default hotkey (Cmd+Shift+Space) starts/stops recording system-wideTest in 5+ different apps (Safari, Terminal, Xcode, Notes, Slack)
AC-06Audio is captured from the default microphone at 16kHz monoUnit test verifying sample rate and channel count
AC-07whisper.cpp transcribes audio with the base modelIntegration test with known audio sample
AC-08Transcribed text is injected at the cursor position via CGEventTest in 5+ different apps
AC-09Long text (>50 chars) falls back to clipboard paste with restoreAutomated test verifying clipboard preservation
AC-10End-to-end latency is under 2 seconds for a 5-word sentenceTimed test on Apple Silicon Mac (M1 or later)
AC-11Settings window opens from menu bar and persists preferencesManual test: change setting, restart app, verify persistence
AC-12User can change the whisper model in SettingsDownload a different model, verify it loads and transcribes
AC-13User can change the global hotkey in SettingsReassign hotkey, verify it works system-wide
AC-14App does not crash during 1 hour of continuous use (idle + periodic dictation)Stability test
AC-15Zero network requests during core operation (record, transcribe, inject)Network monitor verification (see Security)
RequirementTargetMeasurement
Startup time<0.5s from launch to menu bar icon visibleTimed measurement
Idle CPU<1% CPU when not recordingActivity Monitor observation over 10 minutes
Idle RAM<100 MB with no model loaded; <500 MB with whisper-base loadedActivity Monitor
Binary size<20 MB (excluding ML models)du -sh VaulType.app
Disk usage<200 MB total with whisper-base modelMeasured after fresh install + model download
Crash rate<0.5% of sessionsTracked during beta testing
Battery impactNo measurable battery drain when idleBattery health comparison over 8 hours

The following features are explicitly deferred to later phases:

FeatureDeferred ToRationale
LLM post-processingPhase 2MVP must work without an LLM download
Multiple processing modesPhase 2Depends on LLM integration
App-aware contextPhase 3Requires per-app profile infrastructure
Dictation historyPhase 3Requires SwiftData history model
Floating overlayPhase 3MVP injects directly without preview
Voice commandsPhase 4Distinct feature set from dictation
Plugin systemPhase 5Requires stable core API
Multi-language UIPost-v1.0English-only UI for MVP

This section serves as a template for tracking feature requests from the community. Items are added here after discussion in GitHub Issues or Discussions and are prioritized based on demand, feasibility, and alignment with VaulType’s values.

#Feature RequestSourceVotesPhaseFeasibilityPrivacy ImpactStatus
CF-001Example: Dictation to multiple apps simultaneouslyGitHub Issue #4212TBDMediumNoneUnder Review
CF-002
CF-003

Requested features are evaluated against the following criteria before being added to a phase:

CriterionWeightDescription
Privacy alignmentCriticalMust operate 100% locally; any cloud dependency is a non-starter
User demandHighNumber of unique requestors and upvotes on the issue
Technical feasibilityHighCan be implemented with reasonable effort using existing architecture
Maintenance burdenMediumLong-term cost of maintaining the feature
Scope creep riskMediumDoes this pull VaulType away from its core mission?
Platform constraintsMediumDoes macOS provide the necessary APIs?
  1. Check the existing issues to avoid duplicates
  2. Open a new issue using the “Feature Request” template
  3. Describe the use case, not just the solution
  4. Indicate whether you are willing to contribute implementation effort
  5. The maintainers will triage and assign a backlog ID (CF-XXX)

2026
Q1 ████████████ Phase 1 (MVP v0.1.0) -- Core dictation [DONE]
Q1 ████████████ Phase 2 (v0.2.0) -- LLM post-processing [DONE]
Q1 ████████████ Phase 3 (v0.3.0) -- Smart features [DONE]
Q1 ████████████ Phase 4 (v0.4.0) -- Voice commands [DONE]
Q1 ████████████ Phase 5 (v0.5.0) -- Power user & polish [DONE]
Q1 ████████░░░░ Phase 6 (v1.0.0) -- Stable release [IN PROGRESS]
MilestoneStatusNotes
v0.1.0 (MVP)Done35/35 tasks complete
v0.2.0 (LLM)Done33/33 tasks complete
v0.3.0 (Smart)Done29/29 tasks complete
v0.4.0 (Voice)Done23/23 tasks complete
v0.5.0 (Polish)Done25/25 tasks complete
v1.0.0 (Stable)In ProgressTesting, docs, release workflow remaining

  • Technology Stack — Complete technology decisions, benchmarks, and integration architecture
  • Database Schema — SwiftData models, UserDefaults keys, migration strategy, and data lifecycle
  • Security Model — Privacy guarantees, threat model, permissions, and security architecture
  • Legal Compliance — GPL-3.0 license, third-party licenses, AI model licensing, and privacy policy

This document is part of the VaulType Documentation. For questions, corrections, or feature requests, please open an issue on the GitHub repository.

VaulType is free software licensed under the GNU General Public License v3.0.