Companion Desktop Widget
MeshUtility Documentation
MeshUtility is an open-source, lightweight desktop assistant widget that combines hands-free voice dictation (MeshVoice) and contextual AI rewriting (MeshPrompt) into a system-wide, always-on-top overlay. By running entirely client-side on Tauri, it injects text directly into any focused window on your machine.
What is MeshUtility?
MeshUtility is a free, open-source companion application designed to run alongside the MeshPilot ecosystem. While MeshConsole functions as your visual command center and terminal workspace, MeshUtility serves as a global input adapter. It resides in your system tray and displays as a sleek, transparent overlay that sits above all other windows when activated.
By integrating native hardware audio capture, on-device Whisper models, system-level keyboard hooks, and inline AI prompt expansions, MeshUtility allows you to talk directly into focused terminals or files and instantly refactor selected text with a single global shortcut.
Why Does it Exist?
Traditional developer interactions with AI models are bottlenecked by:
- Input Speed Limits: Expressing a complex refactoring idea or writing long instructions verbally is up to 3x faster than typing it manually.
- Context Switching: Switching tabs to a browser-based chat or copy-pasting code fragments breaks visual focus. MeshUtility injects code directly into your cursor's focus area.
- Security & API Costs: Instead of routing raw microphone audio or private keys through a third-party server, MeshUtility processes speech locally and uses locally encrypted credentials for AI requests.
System Architecture
MeshUtility utilizes a split-core design written in Rust (Tauri backend) and React (frontend interface) to coordinate low-latency media input, local inference, and OS-level key injection:
graph TD
A[Microphone Audio Input] -->|CPAL 16kHz WAV| B[Audio Buffer Queue]
B --> C[VAD Silence Detection]
C -->|Trigger Cutoff| D{Whisper Provider}
D -->|Local Inference| E[Whisper.cpp / GGML Engine]
D -->|Cloud Fallback| F[Groq Transcribe API]
E --> G[Raw Transcribed Text]
F --> G
G --> H[SQLite Custom Dictionary Replacements]
H -->|Syntax Corrections| I[Syntactically-Aware Code]
I --> J[Win32 SendInput Unicode Injector]
J -->|Direct Keystrokes| K[Focused Window / Terminal]Voice Dictation (MeshVoice)
MeshVoice delivers low-latency developer-focused speech-to-text. Unlike consumer dictation software, it is optimized to understand variables, syntax, commands, and multi-lingual transitions.
CPAL Audio Recording
Using the cpal (Cross-Platform Audio Library) crate, MeshVoice binds to your default system input device. It locks the hardware input stream to capture raw audio at exactly 16,000 Hz, 16-bit, Mono PCM WAV—the standard sample layout required by Whisper acoustic encoders. This eliminates runtime resampling overhead.
Local Whisper.cpp & Groq Fallback
Depending on your network state and hardware performance, dictation can be processed in one of two ways:
- Local Whisper.cpp: MeshUtility bundles bindings to GGML-formatted Whisper models (e.g.
ggml-base.en.bin). It takes advantage of SIMD, AVX-512, and GPU acceleration on Windows (via Vulkan or CUDA) to perform on-device transcription with sub-second latency. - Groq Cloud Fallback: If local hardware is constrained, MeshUtility can securely dispatch the 16kHz audio buffer to the MeshPilot cloud router, which calls Groq's high-speed
whisper-large-v3model using a rate-limited bearer token. See the endpoint implementation intranscribe/route.ts.
Voice Activity Detection (VAD) Thresholds
To prevent continuous recording of background room noise, MeshVoice implements a Voice Activity Detection (VAD) engine. By monitoring short audio window energy levels (RMS), the recording automatically flags silence boundaries when:
- Silence Threshold
- Default: -45 dB. Configurable noise gate matching ambient room noise.
- Hang Time
- 800ms of consecutive silence triggers automatic recording cutoff and sends the audio buffer for transcription.
Custom Dictionary SQLite Replacement
A major issue with standard speech-to-text models is their inability to write correct programming syntax. Dictating "const db url equals process dot env dot database url" often outputs literal words. MeshUtility bypasses this by feeding the output of the Whisper model through a local SQLite replacement engine:
Dictation Post-Processing Examples
| Spoken Phrase | SQLite Pattern Match | Injected Output |
|---|---|---|
| "const db url equals process dot env dot database url" | /\bconst\s+(\w+)\s+equals\b/i | const dbUrl = process.env.DATABASE_URL; |
| "npm eye" / "npmi" | /\bnpmi\b/i | npm install |
| "git check out main" | /\bgit check out\b/i | git checkout main |
Win32 SendInput Unicode Injection
Once transcribed and corrected, MeshUtility injects the text directly into the focused window. On Windows, rather than relying on copying text to the clipboard and simulating `Ctrl+V` (which clears the user's active clipboard history), it makes direct calls to the Win32 SendInput API. It synthesizes Unicode keyboard events (KEYEVENTF_UNICODE), inserting text characters sequentially directly into the focused editor, terminal buffer, or web browser.
Prompt Rewriting (MeshPrompt)
MeshPrompt is an inline text assistant. Select any text on your computer, hit the hotkey, and watch the selection get rewritten or analyzed by an AI model of your choice.
Global Hotkey & Modifiers Release Sequence
MeshPrompt registers a system-wide hotkey: Ctrl+Shift+Space. When pressed, the application must immediately capture the active text selection. Because Tauri's hotkey listener intercepts keys at a low hardware level, simply launching a window while keys are pressed causes modifier keys to "stick" in the OS keyboard buffer. To prevent this, MeshPrompt executes a modifier release sequence, programmatically clearing the virtual key state of Ctrl and Shift before drawing the interface overlay.
Transparent React Overlay
Upon activation, MeshPrompt spawns a Tauri webview window configured to be:
- Always-on-top: Positioned above the taskbar and all application windows.
- Transparent: Leverages Tauri's window transparency configurations to show a borderless, floating UI pill.
- Cursor-Anchored: Spawns directly adjacent to the current coordinates of the mouse cursor, reducing eye movements.
DPAPI Key Storage Encryption
To query OpenAI, Anthropic, Ollama, or Groq, MeshPrompt requires access to API keys. On Windows, these keys are stored locally using the Windows Data Protection API (DPAPI). Keys are encrypted using the current user's Windows login credentials before being written to disk at %APPDATA%/MeshUtility/config/keys.enc. This guarantees that keys can only be decrypted by the logged-in user on that specific machine, preventing malware or other user accounts from harvesting credentials.
Custom Prompt Templates
MeshPrompt uses templates defined in Markdown. You can create custom templates in the utility UI, which wraps your selection before passing it to the AI. For example:
---
name: Refactor Function
hotkey: R
provider: anthropic/claude-3-5-sonnet
---
Review this code snippet and refactor it for performance, clean architecture, and type safety.
Return ONLY the code block output without any introductory conversational text.
Code to Refactor:
${selection}How to Clone & Run
MeshUtility is fully open source. You can compile, run, and modify the application locally:
Prerequisites
- Node.js (v18+) and npm
- Rust toolchain (stable compiler and cargo)
- Windows: C++ Build tools installed via Visual Studio Installer (Desktop development with C++)
Build Instructions
git clone https://github.com/Jenesh11/MeshUtility.git
cd MeshUtility
npm install
npm run tauri devBest Practices
- Use local Whisper models for privacy: Download
ggml-base.en.bininside the MeshUtility configuration wizard. Offline inference ensures zero telemetry leaves your system. - Groom your SQLite dictionary: Frequently add your project's custom variables, function names, and command shortcuts to the SQLite replacements tab to avoid spelling corrections. For example, mapping spoken "antigravity" to `Antigravity` keeps CLI interactions accurate.
- Clear sticking modifiers: If you notice capital letters getting stuck after using
Ctrl+Shift+Space, increase the modifier release delay (in ms) under settings.
Limitations
- OS Integrations: DPAPI storage encryption and Win32
SendInputUnicode injection are native to Windows. On macOS and Linux, the application falls back to clipboard emulation and keycode injection via native AppleScript/X11 wrappers. - Hardware Requirements: Local Whisper.cpp model execution scales with your CPU core count and GPU threads. Devices without AVX support will experience high inference delays (exceeding 3-4 seconds per sentence).
Troubleshooting
Global hotkey not working
Make sure no other software (such as Windows PowerToys, Discord, or Microsoft Teams) is hijacking Ctrl+Shift+Space or your custom dictation hotkey. Run MeshUtility as an Administrator if you are injecting text into elevated Command Prompts or PowerShell windows.
Characters are scrambled or missing during injection
Slow terminals or legacy consoles can struggle to process rapid keystroke events. Increase the Inter-Character Injection Delay in the MeshUtility settings (e.g. from 0ms to 5ms) to give the target input buffer time to process each character.
Audio recording fails to initialize
Verify that your system input device matches the sample rate expected by the app. Under Windows Sound Settings, verify your active microphone is configured to 1 channel (Mono) or 2 channels (Stereo), 16-bit, 44100Hz or 48000Hz. CPAL will negotiate the device connection but requires active microphone OS access permissions.
Related Resources
Ready to build
Move from reading the docs to running MeshPilot.
Create a free account, install MeshConsole, and connect your first MCP client. Support is one click away if you get stuck.