VaulType Stability Testing Protocol
Overview
Section titled “Overview”Protocol for verifying VaulType can sustain continuous operation without crashes, resource leaks, or degraded performance. This document defines repeatable one-hour test scenarios tied to the actual service lifecycle patterns in the codebase. Each scenario targets a specific failure mode that arises in long-running menu bar apps, C-bridged inference engines, or real-time audio pipelines.
Execute the scenarios in the order listed. Record results in the Results Template at the end. The cumulative test session totals approximately 60–70 minutes of active runtime.
Environment Setup
Section titled “Environment Setup”Required Hardware
Section titled “Required Hardware”- Mac: Apple Silicon (M1 or newer) — primary target; run Intel Mac as secondary if available
- RAM: Minimum 8 GB; 16 GB recommended when testing with large LLM models (7B+)
- Microphone: Built-in microphone for baseline tests; USB audio interface for disconnect scenario
- Storage: At least 10 GB free for model files and swap space monitoring
Software
Section titled “Software”- macOS: 14.0+ (Sonoma or later)
- Xcode: 16.2
- Build configuration: Release (Archive build — Debug adds ARC instrumentation that skews CPU figures)
- Date tested: (fill in during testing)
- Tester: (fill in during testing)
- Build version: (fill in during testing)
Activity Monitor Setup
Section titled “Activity Monitor Setup”Open Activity Monitor before starting any scenario:
- Open Activity Monitor (Applications > Utilities > Activity Monitor).
- Select the Memory tab. Locate the
VaulTypeprocess row. - Select the CPU tab in a second Activity Monitor window (Window > CPU Usage) or pin the CPU column alongside Memory.
- Enable View > All Processes if VaulType does not appear.
- Pin both Real Memory and CPU % columns to visible positions.
- Set the update frequency: Activity Monitor > View > Update Frequency > Every 2 Seconds.
Terminal Monitoring Setup
Section titled “Terminal Monitoring Setup”Open a terminal and run the following top command pinned to the VaulType process for
continuous sampling throughout the session:
# Replace <PID> with the actual VaulType process ID from Activity Monitortop -pid $(pgrep VaulType) -stats pid,command,cpu,rsize,vsize,threads -d 2For periodic snapshots at key scenario boundaries, use:
# Snapshot CPU and memory at any momentps -o pid,pcpu,rss,vsz,threads -p $(pgrep VaulType)For virtual memory pressure:
# Virtual memory statistics (run before and after each scenario)vm_statFor GPU and swap:
# GPU memory pressure (run after model-loading scenarios)sudo powermetrics --samplers gpu_power -n 1Pre-Test Baseline
Section titled “Pre-Test Baseline”Capture the following baseline measurements before running any scenario, with VaulType launched and fully idle, no models loaded, settings window closed.
Steps:
- Build a Release configuration:
Terminal window xcodebuild -scheme VaulType -configuration Release build \ONLY_ACTIVE_ARCH=YES - Launch VaulType from the build output (not from Xcode — Xcode’s debugger inflates memory).
- Wait 60 seconds for all launch-time service initialisations to settle
(
AppDelegate.applicationDidFinishLaunchingcompletes,NotificationCenterobservers registered,PowerManagementServicesampling started). - Record the following from Activity Monitor:
| Metric | Baseline Value |
|---|---|
| Real Memory (RSS) | _______ MB |
| Virtual Memory (VSZ) | _______ MB |
| CPU % (idle, 10-second average) | _______ % |
| Thread count | _______ |
| Open files (via `lsof -p $(pgrep VaulType) | wc -l`) |
- Run
vm_statand record Pages free and Pages wired down as numeric baselines.
Baseline RSS: _______ MB Baseline thread count: _______
Test Scenarios
Section titled “Test Scenarios”Scenario 1: Idle Stability (30 minutes)
Section titled “Scenario 1: Idle Stability (30 minutes)”Purpose: Confirm the app produces no resource drift when completely inactive — no dictation, no settings window open, no model loaded.
Source files: VaulType/App/AppDelegate.swift, VaulType/Services/PowerManagementService.swift
Relevant behaviour:
AppDelegate — battery/thermal observers registered via NotificationCenterPowerManagementService — polls battery state and thermal pressure on a timerHotkeyManager — global event monitor running continuously (NSEvent.addGlobalMonitorForEvents)AppContextService — NSWorkspace.didActivateApplicationNotification observer activeSteps:
- Launch VaulType with no whisper model and no LLM model selected.
- Close all VaulType windows (settings, overlay).
- Start a terminal
topsession pinned to VaulType. - Record the start RSS from Activity Monitor.
- Do nothing for 30 minutes. Do not interact with the Mac in a way that switches the frontmost app more than once per minute (AppContextService fires on every app switch).
- At the 30-minute mark, record end RSS and CPU % average over the last 60 seconds.
Expected:
- CPU: Average < 0.5% over any 5-minute window during the idle period.
- RSS growth: Total growth from start RSS to end RSS < 5 MB over 30 minutes.
- Thread count: Stable — no new threads created after the initial stabilisation period.
- No crashes: VaulType process remains alive for the full 30 minutes.
- Open file descriptors: Count at end matches count at start (within ±5 for any transient system operations).
Monitor:
- Activity Monitor: CPU and Real Memory columns, sampled every 2 seconds.
topterminal output: watchrsizeandcpucolumns for trends.- Console.app: filter by
com.vaultype.appsubsystem — anyfaultorerrorlevel log during idle indicates unexpected background activity.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Peak CPU during idle | < 1.0% instantaneous, < 0.5% average |
| RSS growth over 30 min | < 5 MB |
| Crashes | 0 |
| Error logs (fault/error level) | 0 unexpected entries |
Result: [ ] PASS [ ] FAIL
Start RSS: _______ MB End RSS: _______ MB Growth: _______ MB Average CPU: _______ % Peak CPU: _______ %
Notes: (record any anomalies here)
Scenario 2: Periodic Dictation (30 minutes)
Section titled “Scenario 2: Periodic Dictation (30 minutes)”Purpose: Confirm the full dictation pipeline can execute repeatedly over 30 minutes without cumulative memory growth or degraded transcription quality.
Source files: VaulType/Services/DictationController.swift,
VaulType/Services/Audio/AudioCaptureService.swift,
VaulType/Services/Speech/WhisperContext.swift
Relevant pipeline:
hotkey down → audioService.startCapture() // AVAudioEngine tap installedhotkey up → audioService.stopCapture() // tap removed, Float array returned → vad.trimSilence(from:sensitivity:) // in-memory, released after call → whisperService.transcribe(samples:) // WhisperContext.queue.async → VoicePrefixDetector.detect(in:) // stateless struct → VocabularyService.apply(to:...) // stateless struct → processingRouter.process(...) // LlamaContext.queue.async (if LLM mode) → injectionService.inject(...) // CGEvent / NSPasteboard (ephemeral) → saveDictationEntry(...) // SwiftData ModelContext (local scope) → HistoryCleanupService.runCleanup() // created fresh, released afterSetup:
- Load a whisper model (e.g.,
ggml-base.en.bin). - Set processing mode to Raw (no LLM) to isolate the audio/whisper cycle first.
- Open a plain text editor (TextEdit) as the injection target.
Steps:
- Record start RSS and start thread count.
- Every 2 minutes, dictate a 10-second phrase (e.g., “The quick brown fox jumps over the lazy dog. Testing one two three. VaulType stability test in progress.”).
- Confirm each dictation cycle:
- DictationState transitions:
idle → recording → transcribing → injecting → idle - Transcribed text appears in TextEdit within 5 seconds of hotkey release.
- DictationState transitions:
- After 15 cycles (30 minutes), record end RSS and end thread count.
Expected:
- Each cycle: Completes within 10 seconds of hotkey release (audio + whisper).
- No cycle failures: DictationState never stalls in
transcribingorprocessing. - RSS growth: Total growth across all 15 cycles < 10 MB.
- Thread count: Stable — no accumulation of orphaned
com.vaultype.whisper.contextdispatch queue threads. - Audio device:
AVAudioEnginereports no errors; nokAudioHardwareUnspecifiedErrorin logs.
Monitor:
- Activity Monitor: RSS sampled immediately after each cycle completes.
lsof -p $(pgrep VaulType) | wc -l: Run after every 5 cycles to confirm no file descriptor leak.- Console.app: Watch for any
errororfaultlevel entries oncom.vaultype.appsubsystem.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| RSS growth over 15 cycles | < 10 MB total |
| Cycle completion rate | 15/15 (100%) |
| Max single-cycle duration | < 10 s (audio) + whisper inference time |
| Crashed cycles | 0 |
| Thread count growth | 0 net threads added over 15 cycles |
Result: [ ] PASS [ ] FAIL
Start RSS: _______ MB End RSS: _______ MB Growth: _______ MB Cycles completed: _______ / 15 Failed cycles: _______
Notes: (record any anomalies here)
Scenario 3: Rapid Start/Stop (10 minutes)
Section titled “Scenario 3: Rapid Start/Stop (10 minutes)”Purpose: Confirm that rapidly toggling audio capture does not leak AVAudioEngine taps, audio buffers, or file descriptors to the audio subsystem.
Source file: VaulType/Services/Audio/AudioCaptureService.swift
Relevant pattern:
// startCapture — installs tap and starts AVAudioEngineinputNode.installTap(onBus: 0, bufferSize: 4096, format: inputFormat) { [weak self] buffer, time in self?.handleAudioBuffer(buffer)}try engine.start()
// stopCapture — removes tap before stopping engineengine.stop()engine.inputNode.removeTap(onBus: 0)Risk: If engine.start() throws after installTap, the tap is installed but the engine is
not running. Confirm removeTap(onBus:) is called in the error path. With rapid cycling, each
startCapture creates a new tap; without the corresponding removeTap, multiple taps can
accumulate on the same bus, causing kAudioUnitErr_TooManyFramesToProcess or a crash in
AVAudioInputNode.
Steps:
- Record start RSS, start thread count, and
lsofcount. - Press and immediately release the dictation hotkey (hold for < 0.5 seconds) to trigger audio start/stop without accumulating meaningful audio data.
- Wait 5 seconds between each start/stop pair.
- Repeat for 10 minutes (approximately 60–100 rapid cycles).
- Record end RSS, end thread count, and
lsofcount.
Expected:
- No crash: VaulType survives all rapid cycles.
- Audio device: No
AVAudioSessionErrorCodeorkAudioHardwareErrorlogs. - RSS growth: < 5 MB over the full scenario.
- File descriptors:
lsofcount returns to start value after each stop (within ±2). - Tap count: Only one tap active at any given time (confirmed by absence of
kAudioUnitErr_TooManyFramesToProcessin system log).
Monitor:
- Console.app: Filter by process
VaulType— watch for anyAVAudio*error messages. lsof -p $(pgrep VaulType) | grep -i audio | wc -l: Run every 2 minutes to monitor audio-related file descriptor count.- Activity Monitor: Thread count — should not grow with each cycle.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Crashes | 0 |
| RSS growth | < 5 MB |
| Audio errors in Console | 0 |
| File descriptor growth (audio) | 0 net growth |
| Thread count growth | 0 net threads |
Result: [ ] PASS [ ] FAIL
Start RSS: _______ MB End RSS: _______ MB Start FD count: _______ End FD count: _______
Notes: (record any anomalies here)
Scenario 4: Long Dictation — 5-Minute Continuous Recording
Section titled “Scenario 4: Long Dictation — 5-Minute Continuous Recording”Purpose: Verify the audio buffer accumulation, whisper inference on a large sample, and the minimum-sample padding logic all work correctly for a sustained recording.
Source files: VaulType/Services/Audio/AudioCaptureService.swift,
VaulType/Services/Speech/WhisperContext.swift,
VaulType/Services/DictationController.swift
Relevant constraint:
Audio padded to minimum 16000 samples (1 second) before whisper.Whisper requires >= 1s of audio input — shorter recordings are padded with silence.A 5-minute recording at 16kHz produces 4,800,000 Float32 samples = ~18.3 MB in memory.The AudioBuffer (os_unfair_lock-protected) holds this in a [Float] array throughout recording.Steps:
- Load a whisper model.
- Record start RSS.
- Press and hold the dictation hotkey. Begin speaking continuously — read aloud any text (e.g., a news article, book passage) for exactly 5 minutes.
- Release the hotkey at the 5-minute mark.
- Observe the DictationState:
transcribingbegins. Wait for it to complete and return toidle. - Record the peak RSS (observed during transcription) and end RSS (after return to idle).
- Verify the transcribed text appears in the injection target — it should reflect approximately 5 minutes of spoken content (may be truncated at whisper’s context limit).
Expected:
- No crash or timeout:
WhisperContext.transcribecompletes within 3× the audio duration (15 minutes maximum for a 5-minute recording on Apple Silicon). - Peak RSS increase: RSS rises by approximately 18–25 MB during transcription (audio buffer
- whisper internal state), then falls back within 10 MB of start RSS after injection.
- DictationState: Transitions cleanly
recording → transcribing → injecting → idlewith no stall intranscribingfor more than 15 minutes. - Audio buffer release: RSS returns toward baseline after
stopCapture()deallocates the internal[Float]array.
Monitor:
- Activity Monitor: Watch RSS spike during transcription, then confirm descent on completion.
top -pid $(pgrep VaulType): Log the peakrsizevalue during whisper inference.- Console.app: Any
whispercategory logs reporting segment count or transcription duration.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Transcription completes | Yes (within 15 min) |
| DictationState stall | None |
| Post-transcription RSS vs start | Within 10 MB |
| Crash | 0 |
Result: [ ] PASS [ ] FAIL
Start RSS: _______ MB Peak RSS: _______ MB End RSS: _______ MB Transcription wall time: _______ seconds
Notes: (record any anomalies here)
Scenario 5: Mode Switching During Active Session (10 minutes)
Section titled “Scenario 5: Mode Switching During Active Session (10 minutes)”Purpose: Verify that switching processing modes between dictation cycles does not cause
pipeline configuration errors, stale LLM state, or memory accumulation from abandoned
ProcessingModeRouter or PromptTemplateEngine instances.
Source files: VaulType/Services/LLM/LLMService.swift,
VaulType/Services/DictationController.swift (processingRouter)
Relevant modes (from DictationController.processingRouter):
Raw — no LLM, direct whisper outputClean — LLMService with grammar cleanup promptStructure — LLMService with formatting promptCode — LLMService with code-specific promptPrompt — LLMService with user-selected PromptTemplateCustom — user-defined processing chainSteps:
- Load both a whisper model and an LLM model.
- Record start RSS.
- Execute the following sequence, dictating a 5-second phrase between each mode switch:
- Set mode to Raw → dictate → confirm raw output appears.
- Set mode to Clean → dictate → confirm LLM-cleaned output appears.
- Set mode to Code → dictate → confirm code-formatted output appears.
- Set mode to Structure → dictate → confirm structured output appears.
- Set mode to Raw → dictate → confirm LLM is bypassed (fast response).
- Set mode to Prompt → select any saved template → dictate → confirm template output.
- Set mode to Clean → dictate → confirm LLM re-engages correctly.
- Repeat the full 7-step sequence twice (14 dictations total).
- Record end RSS and confirm LLM is still responsive.
Expected:
- Mode transitions: Each mode switch takes effect on the very next dictation cycle. No stale mode state carries over.
- LLM re-engagement: After switching from Raw back to Clean, the
LlamaContextresponds within normal inference time (no cold-load delay — model should remain in memory). - RSS growth: < 10 MB across 14 cycles including LLM processing.
- No errors:
processingRouter.process()does not throw on any cycle.
Monitor:
- Activity Monitor: RSS during LLM inference cycles vs Raw cycles — confirm LLM is not accumulating KV-cache across mode switches.
- Console.app: Filter by
com.vaultype.appllmcategory for anyGenerationResulterrors.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Mode switches that take effect immediately | 14/14 |
| LLM processing errors | 0 |
| RSS growth across 14 cycles | < 10 MB |
| Crash | 0 |
Result: [ ] PASS [ ] FAIL
Start RSS: _______ MB End RSS: _______ MB Growth: _______ MB Mode switch failures: _______
Notes: (record any anomalies here)
Scenario 6: Model Hot-Swap (5 minutes)
Section titled “Scenario 6: Model Hot-Swap (5 minutes)”Purpose: Verify the whisper pipeline reconfigures cleanly when a different model is selected
mid-session, with no residual C-heap allocations from the previous whisper_context.
Source files: VaulType/Services/Speech/WhisperContext.swift,
VaulType/Services/DictationController.swift (loadWhisperModel, unloadWhisperModel)
Relevant pattern:
// WhisperContext.deinit — called when old context is replacedqueue.sync { if let ctx = context { whisper_free(ctx) // C heap freed on dedicated queue context = nil }}
// NotificationCenter hot-reload signalNotificationCenter.default.post(name: .whisperModelDownloaded, object: nil)// AppDelegate handles this and calls dictationController.loadWhisperModel(url:)Steps:
- Load whisper model A (e.g.,
ggml-base.en.bin, ~150 MB). - Dictate a phrase and confirm model A is active (check transcription latency — base is faster than small).
- Record RSS with model A loaded.
- Navigate to Settings > Models. Load whisper model B (e.g.,
ggml-small.en.bin, ~500 MB). - Wait for the
handleModelDownloadednotification to fire and the new context to initialise. - Dictate a phrase and confirm model B is active.
- Record RSS with model B loaded.
- Switch back to model A. Record RSS after switch back.
- Repeat the A → B → A switch 3 more times (5 total switches).
- After the final switch, record final RSS.
Expected:
- RSS after each unload: Drops by approximately the size of the unloaded model’s memory footprint (model A ≈ 150–200 MB, model B ≈ 500–600 MB).
- No residual whisper_context: After switching away from model A, no model-A-sized
anonymous VM regions persist (confirm with
vmmap -resident $(pgrep VaulType) | grep -i ggml). - Pipeline reconfiguration: Each dictation after a switch uses the correct model with
no
contextNotInitializederrors. - RSS stability: After 5 switches, RSS with model A loaded should be within 20 MB of the first model A reading.
Monitor:
# After each model switch, sample GGML anonymous regionsvmmap -resident $(pgrep VaulType) | grep -E "MALLOC|anonymous" | sort -k3 -rn | head -20Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Successful model switches | 5/5 |
| whisper_context errors | 0 |
| RSS drift after 5 switches (vs first load) | < 20 MB |
| Crash | 0 |
Result: [ ] PASS [ ] FAIL
RSS (model A first load): _______ MB RSS (model B): _______ MB RSS (after 5 switches, model A): _______ MB
Notes: (record any anomalies here)
Scenario 7: Settings Changes Under Load (5 minutes)
Section titled “Scenario 7: Settings Changes Under Load (5 minutes)”Purpose: Verify that modifying settings while a dictation cycle is in progress does not corrupt shared state, cause a crash, or produce incorrect output.
Source files: VaulType/App/AppDelegate.swift (settings observers),
VaulType/Services/DictationController.swift (configuration properties),
VaulType/Views/Settings/ (all tab views)
Risk: AppDelegate monitors UserSettings changes and reconfigures the pipeline
(e.g., changing the hotkey, thread count, or processing mode). If a settings write races
with an in-flight dictation, shared mutable state on DictationController may produce
undefined behaviour.
Steps:
- Load a whisper model. Open the Settings window to the Audio tab.
- Begin a dictation (press and hold the hotkey). While still recording:
- Adjust the VAD sensitivity slider.
- Change the whisper thread count spinner.
- Release the hotkey and allow the transcription to complete. Confirm output appears.
- While transcription is in progress (
transcribingstate):- Navigate to Settings > Processing tab.
- Change the active processing mode.
- Confirm the dictation completes (using the mode that was active at hotkey-release, not the newly selected mode — mode changes should take effect on the next cycle).
- Open Settings > General tab. Change the global hotkey while idle.
- Confirm the new hotkey triggers dictation correctly.
- Open Settings > Models tab. Unload the LLM model while in Clean mode.
- Trigger a dictation — the pipeline should fall back to Raw mode gracefully (or display an appropriate error, not crash).
Expected:
- No crash during any settings change — neither during recording nor transcription.
- In-flight cycle completes: Changing settings mid-cycle does not abort or corrupt the current dictation.
- Settings take effect: Next cycle after each settings change uses the new configuration.
- Hotkey re-registration: New hotkey takes effect immediately with no duplicate triggers.
- LLM unload fallback: Pipeline degrades gracefully when LLM is removed while a mode requiring LLM is selected.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Crash from concurrent settings write | 0 |
| In-flight cycles corrupted | 0 |
| Hotkey re-registration failures | 0 |
| LLM unload crash | 0 |
Result: [ ] PASS [ ] FAIL
Notes: (record any anomalies here)
Scenario 8: Recovery from Audio Device Disconnect (5 minutes)
Section titled “Scenario 8: Recovery from Audio Device Disconnect (5 minutes)”Purpose: Verify that unplugging and replugging a microphone during recording does not
crash the app, leave the AVAudioEngine in a broken state, or permanently disable dictation.
Source file: VaulType/Services/Audio/AudioCaptureService.swift
Relevant risk:
// AVAudioEngine will post AVAudioEngineConfigurationChange when hardware topology changes.// If the tap-installed inputNode becomes invalid, subsequent startCapture() attempts// on the old engine configuration will throw kAudioHardwareNotRunningError.// AudioCaptureService must handle AVAudioEngineConfigurationChange to re-configure.Setup: Connect a USB microphone or headset as the primary audio input device in System Settings > Sound > Input.
Steps:
- Load a whisper model. Confirm dictation works with the USB device as input.
- Record start RSS and start thread count.
- Begin a dictation (press and hold the hotkey). While recording:
- Physically unplug the USB microphone.
- Observe the app behaviour:
- Does VaulType crash?
- Does it display an error, or silently continue recording on the fallback (built-in) mic?
- Does
DictationStateresolve toidleorerror?
- Replug the USB microphone.
- Attempt a new dictation using the rebuilt audio engine. Confirm it completes successfully.
- Repeat the disconnect/reconnect cycle 3 times to confirm consistent recovery behaviour.
Expected:
- No crash when the audio device is removed during recording.
- Graceful error:
DictationStatetransitions toerror("Audio device disconnected")or equivalent; the overlay/menu bar reflects the error state. - Automatic or manual recovery: After replug, the next dictation attempt succeeds (either
via automatic
AVAudioEngineConfigurationChangehandling, or by the user pressing the hotkey). - No orphaned taps: After recovery, only one tap is installed on the new input node (no duplicate tap errors on the second recording attempt).
- RSS stable: Disconnect/reconnect does not leak audio subsystem resources.
Monitor:
- Console.app: Filter by
AVAudioforAVAudioEngineConfigurationChangeand engine error logs. lsof -p $(pgrep VaulType) | grep -i audio: Confirm audio file descriptor count is stable after each reconnect.
Pass criteria:
| Metric | Pass Threshold |
|---|---|
| Crash on disconnect | 0 |
| Crash on reconnect | 0 |
| Successful dictation after reconnect | 3/3 cycles |
| Orphaned audio taps | 0 |
| RSS growth from disconnect cycles | < 5 MB |
Result: [ ] PASS [ ] FAIL
Disconnect 1 behaviour: (idle/error/crash) _______ Disconnect 2 behaviour: _______ Disconnect 3 behaviour: _______ RSS growth: _______ MB
Notes: (record any anomalies here)
Monitoring Commands Reference
Section titled “Monitoring Commands Reference”The following commands are referenced across multiple scenarios. Collect all output in a single log file per test session:
# --- PID lookup ---VAULTYPE_PID=$(pgrep VaulType)echo "VaulType PID: $VAULTYPE_PID"
# --- Continuous CPU and memory via top ---top -pid $VAULTYPE_PID -stats pid,command,cpu,rsize,vsize,threads -d 2
# --- Point-in-time snapshot ---ps -o pid,pcpu,rss,vsz,threads -p $VAULTYPE_PID
# --- Open file descriptor count ---lsof -p $VAULTYPE_PID | wc -l
# --- Audio-related file descriptors only ---lsof -p $VAULTYPE_PID | grep -iE "audio|sound|core audio" | wc -l
# --- Virtual memory pressure (run before and after each scenario) ---vm_stat
# --- GGML memory regions (run after model loads and after model switches) ---vmmap -resident $VAULTYPE_PID | grep -E "MALLOC|anonymous" | sort -k3 -rn | head -30
# --- Full virtual map snapshot (save to file for comparison) ---vmmap $VAULTYPE_PID > /tmp/vaultype_vmmap_$(date +%H%M%S).txt
# --- Diff two vmmap snapshots to detect new anonymous regions ---diff /tmp/vaultype_vmmap_before.txt /tmp/vaultype_vmmap_after.txt
# --- Thread count over time ---while true; do ps -M -p $VAULTYPE_PID | tail -n +2 | wc -l sleep 5done
# --- GPU power during model inference (requires sudo) ---sudo powermetrics --samplers gpu_power -n 3 -i 1000
# --- Swap usage ---sysctl vm.swapusage
# --- Thermal state ---pmset -g thermlog | tail -5Pass Criteria Summary
Section titled “Pass Criteria Summary”A full session is considered passing when ALL of the following thresholds are met across all eight scenarios.
| Criterion | Pass Threshold |
|---|---|
| App crashes (any scenario) | 0 |
| DictationState stalls (any scenario) | 0 |
| RSS growth — idle scenario (30 min) | < 5 MB |
| RSS growth — periodic dictation (15 cycles) | < 10 MB |
| RSS growth — rapid start/stop (~100 cycles) | < 5 MB |
| Peak RSS during long dictation (5 min audio) | Returns within 10 MB of pre-dictation baseline |
| RSS growth — mode switching (14 cycles) | < 10 MB |
| RSS drift after 5 model hot-swaps | < 20 MB vs first load |
| Audio device reconnect success rate | 3/3 |
| Settings changes causing corrupt output | 0 |
| Error logs (fault/error level, unexpected) | 0 over the full session |
| Dictation cycle completion rate | >= 95% (at most 1 failure per scenario) |
Fail fast rule: If any scenario produces a crash, stop testing and file a DevTrack bug task before continuing. Crashes invalidate subsequent scenario results.
Reporting a Failed Scenario
Section titled “Reporting a Failed Scenario”For each scenario that produces a FAIL result:
- Note the exact step number and action that triggered the failure.
- Capture a Console.app log export filtered to the
com.vaultype.appsubsystem for the relevant time window (File > Export). - Capture an Activity Monitor screenshot showing the RSS and CPU at the time of failure.
- If the app crashed, open
~/Library/Logs/DiagnosticReports/and locate theVaulType-*.crashfile. Attach the symbolicated crash log. - Record the following:
Scenario: <number and name>Failure step: <step number>Trigger action: <exact action that caused the failure>DictationState at failure: <idle/recording/transcribing/processing/injecting/error>Console error: <paste the relevant log lines>RSS at failure: <value>Crash log: <filename, or "no crash">- File a DevTrack bug task:
Terminal window curl -s -X POST "$DEVTRACK_URL/webhooks/task/create" \-H "Authorization: Api-Key $DEVTRACK_API_KEY" \-H "Content-Type: application/json" \-d '{"project": "VAULTYPE","title": "Bug: Stability failure in <Scenario N: name>","description": "Step <N> failed. Trigger: <action>. Console error: <log>. RSS: <value>.","priority": "high"}'
Results Template
Section titled “Results Template”Fill in this table after completing all scenarios.
| Scenario | Duration | Status | Start RSS | End RSS | Growth | Crashes | Notes |
|---|---|---|---|---|---|---|---|
| 1. Idle Stability | 30 min | ||||||
| 2. Periodic Dictation (15 cycles) | 30 min | ||||||
| 3. Rapid Start/Stop (~100 cycles) | 10 min | ||||||
| 4. Long Dictation (5 min audio) | 5 min | ||||||
| 5. Mode Switching (14 cycles) | 10 min | ||||||
| 6. Model Hot-Swap (5 switches) | 5 min | ||||||
| 7. Settings Changes Under Load | 5 min | ||||||
| 8. Audio Device Disconnect (3x) | 5 min |
Baseline RSS: _______ MB Final RSS (session end): _______ MB Total session RSS growth: _______ MB Total crashes across session: _______
Overall result: [ ] ALL PASS [ ] FAILURES — see individual scenario notes above.
Tested by: _______________ Date: _______________ Build: _______________