Proposal: ephemeral workspace pool — leased, reaped, recycled
Status: Draft · Area: workspaces, sources, lifecycle · Surfaces: REST + GraphQL (operator) + Angee/agent provisioning
Summary
Add a second, ephemeral tier to the workspace primitive — warm, leased, and auto-reclaimed — alongside today's durable, named workspace. The durable tier (workspace/<name>, pushed as a PR branch) is unchanged. The new tier is for the agent that needs an isolated, ready-to-run environment now, will throw it away, and must not strand it on the host when it dies.
Three coordinated mechanisms, each completing or reusing machinery the operator already has:
- A workspace lease —
{owner, expiry}on a workspace — so a pooled or agent-held workspace has an explicit holder and a deadline. Reuses thePortLease{Owner, CreatedAt}precedent (manifest.go:68). - A TTL reaper — a background sweep that actually destroys expired workspaces. Today the operator computes
state: "expired"(workspaces.go:192) but nothing reaps it; the workspace lingers with its worktree, ports, and inner-stack data. This closes a half-built feature. - A warm pool — keep K pre-provisioned workspaces per template, hand one out on request, and recycle it (reset git + re-seed) instead of rebuilding it. This amortizes the expensive bootstrap (
build → migrate → rebac sync → resources load → schema) across many short-lived tenants.
The operator stays a generic mechanism: it leases, reaps, and recycles workspaces. Why an agent wants one, and who may hold one, stay in Angee (REBAC-gated), exactly as for sources.
Background: the two "treehouse" models
Two unrelated open-source tools named treehouse bracket the design space, and Angee's workspace already sits at one end:
- mark-hingston/treehouse-worktree — a named worktree per agent: durable branch, setup-on-create, merge/complete flow, plus an agent lock (
--agent <id> --expiry, auto-release) and an MCP server so agents self-serve. Angee's durable workspace is already a superset of this (multi-slot sources, port pools, inner stack, GitOps topology, the three update scopes) — except it has no lock/lease and no MCP facade. - kunchenguid/treehouse — a worktree pool: "manage worktrees without managing worktrees." Request a clean tree, get a recycled one (caches preserved), return it to the pool on exit; conservative pruning removes only idle + clean + already-merged trees. Angee has none of this — every workspace is a cold, durable build.
The durable tier is the first model, already shipped and richer. This proposal lifts the ideas of the second (pool, recycle, prune) and the lock of the first — not the tools — onto the existing primitive. We do not adopt their detached-HEAD model or their separate .treehouse/.cursor/worktrees.json metadata stores: the manifest is the single source of truth, and named workspace/<name> branches are load-bearing for the PR flow.
Problem
A managed agent fleet wants disposable environments at interactive speed. Today that is expensive and leaky:
- Cold start. Every
workspaceCreatematerializes worktrees (workspaces.go:944), renders copier, and the caller then runs the full inner-stack bootstrap from scratch — install deps, migrate an empty DB, sync REBAC, load resource data. For a throwaway task this is minutes of setup per use, repaid nothing. - No reclamation.
TTL/TTLExpiresAtexist on the workspace (manifest.go:177) andWorkspaceStatusreportsexpired(workspaces.go:192), but no code destroys an expired workspace. An agent that dies leaves its worktree, its leased ports (releaseWorkspacePorts,workspaces.go:573runs only on explicitWorkspaceDestroy), and its inner-stack data on the host indefinitely. The compose-isolation proposal already documents stale per-workspace containers piling up (compose-project-isolation.md). - No coordination. A workspace has no notion of who holds it. The only guard against two actors driving the same worktree is after-the-fact
branch-mismatch/discrepancydetection (workspaces.go:22). For a shared pool that is not enough — handing the same warm tree to two agents corrupts both.
These are three faces of one gap: the workspace primitive models durable, human-owned branches well and ephemeral, machine-held environments not at all.
Current behavior (the machinery to reuse)
The operator already owns every hard part; the gaps are lifecycle policy, not mechanism.
- Worktree materialization is native:
materializeWorkspaceSource(workspaces.go:1072) callsWorktreeAdd/WorktreeAddBranch(git.go:99) over a shared source cache. A pool member is just a workspace; recycling it isgit reset+ re-checkout against the base ref — git operations the package already performs forWorkspaceSyncBase(workspaces.go:834). - Leases already exist for ports.
PortLease{Port, Owner, CreatedAt}(manifest.go:68) is owned byworkspace/<name>/<pool>strings and reclaimed by prefix on destroy (workspaces.go:573). A workspace lease is the same shape one level up. - TTL is computed, never enforced.
WorkspaceCreateparsesreq.TTLintoTTLExpiresAt(workspaces.go:92) andWorkspaceStatusderivesExpired(workspaces.go:192) — but no sweeper acts on it. The reaper is the missing consumer. - An actor identity already exists.
mintConnectionToken(actor, scope, ttl)(schema.graphql:336,tokens.go:81) issues per-actor scoped JWTs. A leaseownerkeys off the same actor string. - State is observable.
GitOpsTopology(gitops.go:20) and the per-slotGitOpsLink(api/types.go:154) already expose clean/dirty/ahead/behind/diverged — exactly the signals a safe reaper and a safe recycle must check before reclaiming.
So: worktrees, leases, TTL fields, actor identity, and clean/dirty signals all exist. What is missing is the policy that ties them into a pool.
Ownership (the load-bearing decision)
The two tiers are two different facts conflated under one word. Splitting them is the spine of the proposal.
| Fact | Durable tier (today) | Ephemeral tier (new) |
|---|---|---|
| Identity | named workspace/<name>, human-chosen | pool slot, machine-allocated |
| Branch | pushed, becomes a PR | local-only, discarded on recycle |
| Holder | a developer, implicitly | a lease {owner, expiry}, explicit |
| Lifecycle end | explicit workspaceDestroy | reaped on TTL/lease expiry; recycled to pool |
| Provisioning | cold build per workspace | warm, bootstrap amortized |
The operator owns lease, reap, recycle (host-local mechanism — it is the only component that sees the worktree, the ports, and the running inner stack). Angee/Django owns who may take a lease and what the workspace is for (REBAC-gated), and never needs to know whether a given workspace came from the pool.
Proposal
1. Workspace lease ({owner, expiry})
Add an optional Lease to the Workspace manifest entry (manifest.go:172), mirroring PortLease:
type WorkspaceLease struct {
Owner string `yaml:"owner" json:"owner"` // actor string, as minted
CreatedAt time.Time `yaml:"created_at" json:"created_at"`
ExpiresAt time.Time `yaml:"expires_at" json:"expires_at"` // lease deadline; renewable
}- Acquire/renew/release ride the existing admin-bearer gate and reuse the decode/writeJSON/auth conventions:
POST /workspaces/{name}/lease({owner, ttl}),DELETE /workspaces/{name}/lease. Acquiring an already-leased, unexpired workspace held by a different owner is a fail-fast409— this is the coordination guard a shared pool needs. - Surface the lease in
WorkspaceStatusand as a field on the workspace node inGitOpsTopology, so the console andangee ws statusshow who holds what. - The lease is advisory over the daemon's admin caller, like every other operator gate: REBAC in Angee decides who may request a lease before the bridge ever calls. The operator only enforces one live holder at a time and expiry.
Renewal (ttl bump) is how a long-running agent keeps a workspace; a dead agent simply stops renewing and the reaper collects it.
2. TTL + lease reaper
Add one background sweep to the operator (the operator already runs background goroutines for log streaming and shutdown — operator.go:235 is the lifecycle home). On a fixed interval it scans the manifest's workspaces and, for each whose TTLExpiresAt or lease ExpiresAt has passed:
- Safety check (the conservative-prune rule, lifted from treehouse ②): reap only when every source slot is idle and clean — not dirty, and either already pushed or its HEAD is already merged into the base ref. The
GitOpsLinkstate (api/types.go:154) already carriesDirty/Ahead/Pushed; the reaper reads it, never re-derives it. - If safe →
WorkspaceDestroy(workspaces.go:364), which already tears down the worktree and releases ports by owner prefix. - If unsafe (dirty or unpushed work past its deadline) → do not delete. Flip the workspace to a sticky
state: "expired"(already a value) and leave it for a human; emit a log line. Silent deletion of unpushed work is the one thing a reaper must never do.
This is purely the missing consumer of fields that already exist; no new state model. Interval and the dirty-grace policy are operator config.
3. Warm pool (recycle, don't rebuild)
A template may declare itself poolable, and the operator maintains a small warm set:
- Pool config lives where port pools already live —
operator.port_poolhas a siblingoperator.workspace_pool.<template>: { size: K, ttl: <dur> }. The operator keeps K pre-rendered, fully bootstrapped workspaces of that template ready (leased to a reservedpoolowner). - Acquire (
POST /workspaces/pool/{template}/acquire,{owner, ttl}): transfer one ready member's lease to the caller and return itsWorkspaceRefin O(reset) instead of O(build). If the pool is empty, fall back to a coldWorkspaceCreate(correctness over latency) and log the miss — no silent cap (per-service-log-streaming.md sets the precedent for surfacing operator-internal events). - Recycle (on release or reap-when-clean): reset each worktree to the base ref (
git reset --hard+ clean, the inverse ofWorkspaceSyncBase), drop the local throwaway branch, then re-run the idempotent bootstrap to restore seed state. Crucially this keeps the expensive caches the durable build pays for once — installed deps under the worktree and the inner-stack DB volume (apersist:bind subpath,compose-project-isolation.md) — so re-seeding is amigrate/resources loaddelta, not a from-zero build. - Topping up is the reaper's mirror image: after handing a member out, render a replacement toward
size: Kin the background.
A pooled workspace is otherwise an ordinary workspace — same worktree, same GitOps topology, same status surface. "Pooled" is a lease owner plus a recycle policy, not a new object.
Design options
A. Two tiers on one primitive, lease-keyed (recommended)
Keep the durable named workspace exactly as is; add the ephemeral tier as lease + reaper + pool policy over the same Workspace/worktree machinery. One object, one status surface, one set of git mechanics; the only new state is the lease. Matches every existing pattern (PortLease, TTL fields, actor tokens) and adds no parallel system. This is the proposal.
B. Detached-HEAD scratch trees (rejected)
Mirror treehouse ②'s pool literally: pooled members run in detached HEAD with no branch, eliminating branch-name bookkeeping. Rejected — it forks the worktree model (the durable tier needs workspace/<name> for push/PR), and Angee's worktrees already avoid name contention by construction (one branch per workspace). A throwaway tree can simply hold a discarded local branch; no second checkout model is warranted.
C. Separate .pool metadata store (rejected)
Track pool membership/locks in a sidecar file, as both treehouse tools do (.treehouse, .cursor/worktrees.json). Rejected — the manifest is the durable source of truth for workspaces, ports, and leases already; a second store would duplicate it and drift. The lease and pool config belong in the manifest.
D. External agent-runtime orchestrator (out of scope, see below)
Let each agent-runtime service manage its own throwaway clones outside the operator. Rejected as the owner — only the operator sees the host's worktrees, ports, and running stacks, so only it can reap safely and pool centrally. This is the same "single-root operator must construct the invariant locally" argument as compose-project-isolation.md.
Security
- All new endpoints ride the existing
s.auth(...)admin-bearer gate; no new auth surface. Authorization over who may hold a lease or acquire from a pool is enforced in Angee (REBAC) before the bridge calls the daemon — identical to sources (global-source-registry.md). - The lease
owneris the already-minted actor string (tokens.go:81); the operator treats it as an opaque label and enforces only single-holder + expiry. - The reaper's dirty/unpushed guard prevents the one dangerous outcome (destroying unpushed work). Recycling resets only local throwaway branches, never a pushed
workspace/<name>branch — the durable tier is untouched.
Backward compatibility
- Purely additive. A workspace with no
Leaseand no pool config behaves exactly as today;WorkspaceCreate/Destroy/SyncBase/Pushare unchanged. - The reaper only acts on workspaces that already carry a
TTL(opt-in at create time,WorkspaceCreateRequest.TTLapi/types.go:84) or a lease. Existing TTL-less workspaces are never touched — so this is strictly safer than today, where anexpiredTTL is reported and then ignored. - Old daemon + new Angee: pool/lease endpoints 404; Angee degrades to cold
WorkspaceCreatewith no lease (status quo). New daemon + old Angee: no lease requests arrive, reaper only sees TTL workspaces — unaffected.
Out of scope
- An operator MCP facade. treehouse ① ships an MCP server so agents drive worktrees natively; Angee already exposes the full REST+GraphQL surface plus the
angee wsCLI skills (/workspace,/pull,/push). WrappingworkspaceCreate/acquire/lease/syncBase/pushas MCP tools is a thin, valuable follow-up — but it is a transport over this contract, not part of it. Record it; build it after the lease/pool land. - Recycle-vs-rebuild heuristics (when a member is too drifted to reset and should be rebuilt). Start with always-reset; tune later.
- Cross-host / multi-operator pools. One operator owns one root; a fleet-wide pool is an Angee-layer concern above several operators.
- Pre-warming policy beyond a fixed
size: K(demand prediction, burst scaling). Fixed size first.
Acceptance
- A workspace created with a
ttlis destroyed by the reaper once expired and clean+pushed; an expired-but-dirty workspace is kept and markedexpired, with a log line. Survives a daemon restart (state in the manifest). POST /workspaces/{name}/leasegrants a single holder; a second owner gets409until expiry or release; the lease shows inWorkspaceStatusandGitOpsTopology.POST /workspaces/pool/{template}/acquirereturns a readyWorkspaceRefin O(reset) when the pool is warm, transfers the lease to the caller, and triggers a background top-up; an empty pool falls back to a cold create and logs the miss.- A recycled member resets to the base ref with no leftover throwaway branch and re-seeded data, without reinstalling deps or recreating the DB volume from scratch (caches preserved).
- The operator source contains no detached-HEAD checkout path and no sidecar pool/lease store — lease and pool config live in the manifest.
docs/reference/operator-api.mddocuments the lease endpoints, the pool acquire endpoint, and theoperator.workspace_poolconfig.
See also
compose-project-isolation.md— parallel workspaces and the stale-container problem this reaper helps bound.global-source-registry.md— the same "operator is a generic mechanism, Angee owns kind + REBAC" split.per-service-log-streaming.md— surfacing operator-internal events (pool misses, reaps) rather than capping silently.internal/service/workspaces.go—WorkspaceCreate(:27), TTL set (:92),Expiredderivation (:192),WorkspaceDestroy(:364),releaseWorkspacePorts(:573),WorkspaceSyncBase(:834),materializeWorkspaceSource(:1072).internal/manifest/manifest.go—PortLease(:68),Workspace/TTL/TTLExpiresAt(:172),WorkspaceSource(:181).internal/git/git.go—WorktreeAdd(:99),WorktreeAddBranch(:108): the materialize/recycle mechanics.internal/operator/tokens.go—MintConnection(:81): the actor identity a lease owner reuses.- Inspiration (ideas, not dependencies): mark-hingston/treehouse-worktree (lock + MCP), kunchenguid/treehouse (pool + recycle + conservative prune).