gmux.ai/writing/devlog
May 13, 2026 Research · 03 ~9 min read live demo running

Building gmux: the devlog.

The annoying-to-solve problem of ten parallel agents. The state-detection fix. The Wayland trap. The port collision. The installer freeze. What got built, what broke, what changed.

Apr · prototype
PyPI + AUR
May · this devlog
Installer · soon
v1.0 · public
Running today — ↗ Multi-agent monitor ↗ Agent flowchart ↗ Memory panel 📱 Phone companion

The starting point.

The problem was embarrassingly simple to state. By early 2026, running multiple AI coding agents in parallel had become normal. Not two. Ten, sometimes more. Each in its own tmux window, each working on a different piece of a different project.

The problem: you can't see which ones need you.

Agent 7 has been sitting on a permission prompt for an hour. Agent 3 finished its task and is waiting. Agent 5 errored out silently. You have no idea — because tmux shows you text boxes and nothing else.

The obvious fix — check each window manually — doesn't scale. The slightly less obvious fix — pattern-match terminal output to detect state — works until it doesn't (a model outputs a spinner character, state flips to "working"). The correct fix required reading the agent's own API.

What already existed.

Before a single gmux-specific repo was written, every component was sitting in MASTER_PROJECTS/. A wake word → TTS → voice-routing pipeline. A voice-to-terminal keystroke injector. MediaPipe Face Mesh driving canvas art. A four-agent orchestration pipeline.

The first version of gmux was supposed to be: link these together, rename the wake word, add hand detection. An afternoon's work, maybe.

That was not what happened.

The first real problem: state detection.

Terminal output parsing worked. Until it didn't. A model outputting a long response with in it would flip the indicator to "waiting" mid-stream. A code block with spinner characters meant the pane was "working" when it was actually idle. The approach was inherently fragile.

qalcode2 exposes /session/status and an SSE endpoint at /event. Every state transition comes through as a structured event. Subscribing to the stream instead of pattern-matching made detection exact and instant.

Stop guessing at state from visual artifacts. Read the events the agent itself emits.

monitor.py became an SSE subscriber rather than a terminal scraper. One subscription per running qalcode2 instance. State written to /tmp/gmux-pane-state.json on every event. This architecture has held — and it's what powers the live status bar in the demo.

The overlay trap.

The first visual layer attempt was a transparent floating window — a chromeless browser window, click-through, with the gesture canvas drawn over the terminal beneath it.

Wayland killed this idea.

On X11, xdotool getwindowgeometry tells you exactly where any window is. The overlay can align its tap targets to the terminal's pane borders to the pixel. Wayland has no equivalent. Windows are isolated. The overlay cannot know where the terminal is. A 1-pixel error means gesturing at pane 3 selects pane 2. At different DPI settings, the error compounds.

Decision logged in DECISIONS.md: Option A (transparent overlay) abandoned. Build the terminal host (Option B) instead.

Option B · Tauri owns the terminal.

If the visual layer can't float above the terminal, it has to be the terminal.

Tauri is a Rust framework that creates native desktop windows with a WebKit WebView inside. The plan: spawn tmux inside the Tauri window via a real PTY, pipe output to xterm.js, layer the gesture canvas on top of it.

TAURI WINDOW · RUST + WEBKIT Gesture canvas · pointer-events: none · MediaPipe overlay xterm.js · renders PTY output 2:◉ volkus 6/8 3:● planner 3/5 4:! research portable-pty (Rust) spawns + manages tmux tmux unchanged · base substrate canvas + xterm.js share the same DOM · pixel-perfect alignment
Fig 1 · How Tauri owns the terminal Gesture overlay is a DOM child of the same window as xterm.js. No alignment guessing.

The terminal is real — not a screenshot, not a widget approximation, not a VTE embed. Full ANSI support, full terminfo, everything tmux supports. And the gesture canvas is a sibling DOM element layered over xterm.js at known pixel offsets. Tap a point in the gesture overlay and the math to find the corresponding tmux pane is exact.

It also solves the Web Speech API problem. webkit2gtk on Linux doesn't implement Web Speech API — every call silently fails. The Tauri app gets around this with a Python sidecar running faster-whisper for offline speech recognition. Better accuracy, no network required.

The camera problem.

Two processes want the webcam at once: the gesture engine and any browser apps (video calls, Brave for demos). Most drivers don't allow multiple readers.

Fix: v4l2loopback. A kernel module that creates a virtual camera device. A background ffmpeg process reads /dev/video0 exclusively and writes to /dev/video2. Everything reads from the virtual device. As many simultaneous readers as needed.

Enforced in three places now: main.js, gmux.py, and gesture/engine.py all refuse to touch /dev/video0 directly. The broker is the only path.

The port collision.

Voice was planned to run on :8765. When it came time to wire the voice daemon, port 8765 was already occupied — by aria-phone, a separate uvicorn server from a different project. The voice daemon didn't start.

Logged: the bridge ports are :8767 (WS), :8768 (HTTP phone), :8769 (gmux_receiver). Port :8765 marked DO NOT USE. Voice moved to :8770 in the daemon, but the Tauri app's connection hasn't been updated to match yet. That's the remaining blocker for criterion #5.

The installer freeze.

An installer was written. It checks dependencies (node, rust, python 3.11, bun, tmux), installs requirements, downloads the MediaPipe hand-landmark model, writes a systemd user service, writes a .desktop entry. It works.

On May 12, 2026, the decision was made to freeze it.

The reasoning: an installer that installs something that doesn't run cleanly is worse than no installer. The installer is the first impression. If gmux-ui launches and the sidebar is empty, that's the experience that gets reported, not "well the install script ran."

Five things must work before installer ships

1. ./scripts/launch.sh opens the Tauri app cleanly on a fresh shell
2. Status sidebar shows live pane state
3. Spawning an agent via the UI creates a tmux window with opencode
4. Permission approve/reject works against a real session
5. Voice connects and transcribes into the UI ✗ port :8770 fix

Three of five are green. When all five are green, the installer resumes.

The qalcode2 overlap realisation.

A useful clarification that took a while to crystallise: many gmux headline features are already inside qalcode2 itself. AI state detection — qalcode2 generates that. Todo tracking — native to qalcode2. Session management — handled by qalcode2.

gmux doesn't replace any of that. It reads it, and presents it across all simultaneously running qalcode2 instances in one view.

qalcode2 = single-agent executor.
gmux = multi-agent interaction layer.

Every feature in gmux that seems to duplicate qalcode2 is gmux surfacing qalcode2's data at a different scale — across 10 panes instead of 1. The features that are genuinely gmux-only: gesture control, voice routing, phone remote, cross-pane status bar, memory layer, RAM/process visibility. Those don't exist in qalcode2 because qalcode2 is one pane.

What's working now.

The Python terminal stack is fully working and published:

pip install gmux && gmux --status-only

That command gives you live AI state in your tmux status bar — reading qalcode2's API, showing states in colour, todo progress per window, session restore. No camera required.

Four browser demos are live on gmux.ai/demo/ — multi-agent monitor, agent flowchart, memory panel, phone companion. All on mock data, no backend, no install.

The Tauri app has working PTY, agent sidebar with 14 panes live, and real data flow confirmed. Three of five installer criteria green. The remaining two are achievable in a focused afternoon.

What's next.

  1. Fix voice port — move the Tauri connection from :8765 to :8770.
  2. Apply the qalcode2 push patch — eliminates 2-second polling lag.
  3. Bundle the MediaPipe model — avoid the CDN fetch on first run.
  4. Pass all five installer criteria.
  5. Ship the installer.
  6. Make the GitHub repo public — terminal AI tools are trending hard right now.

The longer item: decide whether the multi-pane orchestration eventually folds into qalcode2 as a "workspace" mode, or stays as a separate product. The gesture/voice/phone-remote angle makes it clearly distinct. That question gets easier to answer once real users are using it.

Try the working parts — ↗ Live demo ↗ Agent monitor ↗ Memory 📱 Phone

The code is at gmux.ai. Install today: pip install gmux or paru -S gmux.

Next · where this fits → Open the live demo → ← Implementations