The starting point.
The problem was embarrassingly simple to state. By early 2026, running multiple AI coding agents in parallel had become normal. Not two. Ten, sometimes more. Each in its own tmux window, each working on a different piece of a different project.
The problem: you can't see which ones need you.
Agent 7 has been sitting on a permission prompt for an hour. Agent 3 finished its task and is waiting. Agent 5 errored out silently. You have no idea — because tmux shows you text boxes and nothing else.
The obvious fix — check each window manually — doesn't scale. The slightly less obvious fix — pattern-match terminal output to detect state — works until it doesn't (a model outputs a spinner character, state flips to "working"). The correct fix required reading the agent's own API.
What already existed.
Before a single gmux-specific repo was written, every component was sitting in
MASTER_PROJECTS/. A wake word → TTS → voice-routing pipeline. A voice-to-terminal
keystroke injector. MediaPipe Face Mesh driving canvas art. A four-agent orchestration pipeline.
The first version of gmux was supposed to be: link these together, rename the wake word, add hand detection. An afternoon's work, maybe.
That was not what happened.
The first real problem: state detection.
Terminal output parsing worked. Until it didn't. A model outputting a long response with
❯ in it would flip the indicator to "waiting" mid-stream. A code block with spinner
characters meant the pane was "working" when it was actually idle. The approach was inherently
fragile.
qalcode2 exposes /session/status and an SSE endpoint at /event. Every
state transition comes through as a structured event. Subscribing to the stream instead of
pattern-matching made detection exact and instant.
Stop guessing at state from visual artifacts. Read the events the agent itself emits.
monitor.py became an SSE subscriber rather than a terminal scraper. One subscription
per running qalcode2 instance. State written to /tmp/gmux-pane-state.json on every
event. This architecture has held — and it's what powers the live status bar in the demo.
The overlay trap.
The first visual layer attempt was a transparent floating window — a chromeless browser window, click-through, with the gesture canvas drawn over the terminal beneath it.
Wayland killed this idea.
On X11, xdotool getwindowgeometry tells you exactly where any window is. The overlay
can align its tap targets to the terminal's pane borders to the pixel. Wayland has no equivalent.
Windows are isolated. The overlay cannot know where the terminal is. A 1-pixel error means
gesturing at pane 3 selects pane 2. At different DPI settings, the error compounds.
Decision logged in DECISIONS.md: Option A (transparent overlay) abandoned. Build the
terminal host (Option B) instead.
Option B · Tauri owns the terminal.
If the visual layer can't float above the terminal, it has to be the terminal.
Tauri is a Rust framework that creates native desktop windows with a WebKit WebView inside. The plan: spawn tmux inside the Tauri window via a real PTY, pipe output to xterm.js, layer the gesture canvas on top of it.
The terminal is real — not a screenshot, not a widget approximation, not a VTE embed. Full ANSI support, full terminfo, everything tmux supports. And the gesture canvas is a sibling DOM element layered over xterm.js at known pixel offsets. Tap a point in the gesture overlay and the math to find the corresponding tmux pane is exact.
It also solves the Web Speech API problem. webkit2gtk on Linux doesn't implement Web Speech API — every call silently fails. The Tauri app gets around this with a Python sidecar running faster-whisper for offline speech recognition. Better accuracy, no network required.
The camera problem.
Two processes want the webcam at once: the gesture engine and any browser apps (video calls, Brave for demos). Most drivers don't allow multiple readers.
Fix: v4l2loopback. A kernel module that creates a virtual camera device. A background ffmpeg
process reads /dev/video0 exclusively and writes to /dev/video2.
Everything reads from the virtual device. As many simultaneous readers as needed.
Enforced in three places now: main.js, gmux.py, and
gesture/engine.py all refuse to touch /dev/video0 directly. The broker
is the only path.
The port collision.
Voice was planned to run on :8765. When it came time to wire the voice daemon, port
8765 was already occupied — by aria-phone, a separate uvicorn server from a
different project. The voice daemon didn't start.
Logged: the bridge ports are :8767 (WS), :8768 (HTTP phone),
:8769 (gmux_receiver). Port :8765 marked DO NOT USE. Voice moved to
:8770 in the daemon, but the Tauri app's connection hasn't been updated to match
yet. That's the remaining blocker for criterion #5.
The installer freeze.
An installer was written. It checks dependencies (node, rust, python 3.11, bun, tmux), installs
requirements, downloads the MediaPipe hand-landmark model, writes a systemd user service, writes
a .desktop entry. It works.
On May 12, 2026, the decision was made to freeze it.
The reasoning: an installer that installs something that doesn't run cleanly is worse than no
installer. The installer is the first impression. If gmux-ui launches and the
sidebar is empty, that's the experience that gets reported, not "well the install script ran."
1. ./scripts/launch.sh opens the Tauri app cleanly on a fresh shell ✓
2. Status sidebar shows live pane state ✓
3. Spawning an agent via the UI creates a tmux window with opencode ✓
4. Permission approve/reject works against a real session ⏳
5. Voice connects and transcribes into the UI ✗ port :8770 fix
Three of five are green. When all five are green, the installer resumes.
The qalcode2 overlap realisation.
A useful clarification that took a while to crystallise: many gmux headline features are already inside qalcode2 itself. AI state detection — qalcode2 generates that. Todo tracking — native to qalcode2. Session management — handled by qalcode2.
gmux doesn't replace any of that. It reads it, and presents it across all simultaneously running qalcode2 instances in one view.
qalcode2 = single-agent executor.
gmux = multi-agent interaction layer.
Every feature in gmux that seems to duplicate qalcode2 is gmux surfacing qalcode2's data at a different scale — across 10 panes instead of 1. The features that are genuinely gmux-only: gesture control, voice routing, phone remote, cross-pane status bar, memory layer, RAM/process visibility. Those don't exist in qalcode2 because qalcode2 is one pane.
What's working now.
The Python terminal stack is fully working and published:
pip install gmux && gmux --status-only
That command gives you live AI state in your tmux status bar — reading qalcode2's API, showing states in colour, todo progress per window, session restore. No camera required.
Four browser demos are live on gmux.ai/demo/ — multi-agent monitor, agent flowchart, memory panel, phone companion. All on mock data, no backend, no install.
The Tauri app has working PTY, agent sidebar with 14 panes live, and real data flow confirmed. Three of five installer criteria green. The remaining two are achievable in a focused afternoon.
What's next.
- Fix voice port — move the Tauri connection from
:8765to:8770. - Apply the qalcode2 push patch — eliminates 2-second polling lag.
- Bundle the MediaPipe model — avoid the CDN fetch on first run.
- Pass all five installer criteria.
- Ship the installer.
- Make the GitHub repo public — terminal AI tools are trending hard right now.
The longer item: decide whether the multi-pane orchestration eventually folds into qalcode2 as a "workspace" mode, or stays as a separate product. The gesture/voice/phone-remote angle makes it clearly distinct. That question gets easier to answer once real users are using it.
The code is at gmux.ai.
Install today: pip install gmux or paru -S gmux.