The agent stack is getting boring, finally.
That sounds like an insult. It is not. Boring is what happens after the magic words stop working and the system starts getting names for its parts.
Today I kept seeing the same shape through different projects:
tmux runtime you can look at
eve runtime you can resume
lossless-claw memory you can expand
12-factor-agents control flow you own
obscura browser surface you can embed
None of these is the whole thing. That is the point.
tmux is not the agent runtime, but it explains the runtime
The tmux thing was the smallest version of the idea.
A pane is just a pane until an agent can read it. Then it becomes shared execution context. The human sees the server log. The agent sees the same server log. Nobody has to copy paste the stack trace into chat like a medieval ritual.
tmux capture-pane -p -t work:server.0
tmux send-keys -t work:server.0 'pytest -x' Enter
That is not product. It is barely interface. But it gives the invariant:
Work should live somewhere both human and agent can inspect.
A terminal is not enough. A chat transcript is not enough. The useful thing is the bridge between them.
Raft is the delegation side
Raft is interesting from the other direction.
It is less about “can the agent run code” and more about “where does the agent live as a teammate?” Names, rooms, inboxes, held drafts, search results that are actions instead of dumps.
This is the part most agent products get wrong. They model agents as always-online humans, which turns every workflow into Slack with more anxiety.
An agent does not need to be socially present all the time. It needs a place to put work, a way to ask for attention, and a way to be ignored safely.
The split that keeps coming back:
tmux preserves execution context
Raft preserves delegation context
The interesting product is the glue. “The agent is still doing work over there” needs to become “I know whether I need to intervene, and where.”
Without that glue, tmux is a remote shell and Raft is a pretty inbox.
eve makes agent execution boring
Vercel’s eve is not trying to be a clever coding agent. It is trying to make agent apps deployable.
The useful model is simple:
session -> turn -> step
A session can live for days. A turn is one user-triggered run. A step is a checkpointable model/tool boundary.
That matters because real agent work does not fit inside one clean request-response cycle. It waits for OAuth. It waits for a human. It waits for a child agent. It crashes halfway through a deploy because computers are computers.
The word I liked was parked.
Not failed. Not done. Parked.
running
waiting for approval
waiting for OAuth
waiting for subagent
waiting for human input
resumable later
This is what agent runtimes need more than another prompt trick. The agent should be allowed to stop without losing its mind.
lossless-claw makes summaries less suspicious
Most compaction feels like burning the book and keeping the back-cover blurb.
lossless-claw does the more sane thing: keep the raw transcript, build a summary DAG over it, and let the model expand the summary when it needs detail.
The key idea is not “better summaries.” It is:
A summary is a navigation index, not a replacement for memory.
That changes the whole trust model.
A rolling summary says: trust me, this is what happened.
A lossless summary says: this is roughly what happened, and here is the handle if you need the exact commands, errors, tool calls, or source messages.
That feels closer to how I want agent memory to work:
raw transcript source of truth
leaf summary local compression
condensed summary higher-level map
expand query drill down when confused
focus brief temporary task lens
The focus brief idea is especially good. Sometimes the agent does not need more memory. It needs a temporary view for this task, right now, without polluting the canonical memory.
12-factor-agents says the quiet part
The most important line from 12-factor-agents is basically this:
Tool calls are structured outputs. You own the control flow.
That sounds obvious until you look at how many agent frameworks hide the loop.
The model should not execute things. The model should propose intent. The runtime decides what that intent means.
fetch_open_issues execute and continue
create_issue hold draft, ask approval
request_human_input pause and notify
repeated_tool_error compact error, maybe repair
third_failure stop pretending, mark blocked
This is the line between a toy agent and a system you can trust.
The other useful idea is event thread as state. Not a chat log. Not a pile of messages. An event log that can project into different views:
event log
-> prompt context
-> task timeline
-> audit trail
-> notification state
-> memory summary
The LLM sees one projection. The UI sees another. The database keeps the source of truth.
That is probably the shape.
Obscura is the browser surface getting smaller
Obscura is more tactical, but still fits.
Headless Chrome is huge. For a lot of agent work, you do not need pixels. You need DOM, JS, links, text, cookies, network requests, maybe a CDP endpoint for Playwright.
Obscura says: what if the browser surface for agents was smaller?
obscura fetch https://example.com --dump markdown
obscura serve --port 9222
obscura mcp --stealth
It does not do screenshots. It does not magically bypass Cloudflare. Good. That makes the boundary clearer.
For agent browsing, “can read the page and click the form” is often more important than “is a perfect Chrome clone.” A lightweight DOM/JS/CDP/MCP browser is a useful primitive.
The stack underneath the vibes
The through-line is that agents are becoming less like chatbots and more like operating systems with a weird language model inside.
Not OS in the grandiose sense. OS in the boring sense:
process model
state model
memory model
IO model
permission model
human interrupt model
The model is only one component. Maybe not even the most interesting one.
The stack I see after today:
human-facing workspace
identity, inbox, held drafts, intervention points
control-flow layer
typed intents, policy, approvals, retries, blockers
durable runtime
session, turn, step, pause, resume, stream
execution surface
terminal panes, browser sessions, logs, tests, files
context layer
raw history, summaries, projections, expansion
If any layer pretends to be all the others, the product gets weird.
Chat tries to be runtime, so long jobs disappear.
Runtime tries to be memory, so context becomes whatever fit in the last prompt.
Memory tries to be truth, so summaries start lying with confidence.
UI tries to be collaboration, so agents become noisy fake coworkers.
The fix is not one better agent. It is better boundaries.
What I would steal
From tmux: named, inspectable execution surfaces.
From Raft: agents need delegation context, not fake social presence.
From eve: parked work should be a first-class state.
From lossless-claw: summaries should be expandable indexes.
From 12-factor-agents: own the loop; tool calls are pending intent.
From Obscura: browser automation for agents can be smaller than Chrome.
The funny thing is that none of this feels like AGI discourse. It feels like backend engineering, product states, and a lot of annoying edge cases finally getting names.
Good.
That means we can build with it.