Global concurrency governor — design
Status: implemented (phases 1–3 landed) · Bead: koryph-1xk · Owner: orchestrator (refactor-core)
Problem
koryph run is one process per project. Each caps only its own wave
width (--max / max_concurrent_slots, default 3), optionally scaled down by
the per-account cost governor (internal/quota, which gates by dollars).
Nothing bounds the sum of concurrently-running agents across projects and
processes: running loops on K projects launches up to 3K headless claude
sessions, which breaches the Claude API concurrency / rate limits (429s,
throttling). The cost governor never bounds concurrency or request rate — a
burst of cheap agents still exceeds the limit.
Goals
- A machine-global cap on concurrently-running agents that every
koryph runprocess respects before launching an agent. - Fair-share allocation across active projects (round-robin) so no project starves. Agents are never preempted, so idle capacity is reclaimed not by lending but when a project drains its frontier and drops its demand — that shrinks the denominator and raises every remaining project's share.
- A per-project override that is allowed but warned — a project may request more width, the global cap still binds, and the override is logged as a fairness risk.
- No daemon: coordination uses shared state under
~/.koryph, consistent with the existing registry and ledger file-lock patterns. - Orthogonal to and composable with the cost governor: a dispatch proceeds only when both governors allow it (concurrency = rate-limit safety; cost = spend safety).
Non-goals (v1)
- Token- or request-per-minute rate windows. v1 governs concurrency, the first-order cause of 429s. TPM/RPM bursts are a documented follow-up; the cost governor already samples usage and can grow a rate window later.
- A central scheduler/daemon or cross-machine coordination. Scope is one host's
~/.koryph.
Design
Shared state (under ~/.koryph)
| Path | Contents |
|---|---|
~/.koryph/governor.json |
{ "pools": { "<provider>": { "max_global_agents": N, ... } } } — one independent cap/AIMD-overlay per provider pool (koryph-v8u.11, see below). Absent/missing pool ⇒ default 8. Edited only by the machine owner (never per-run), so no single project can lift the ceiling. |
~/.koryph/slots/<project>-<bead>-<pid>.json |
One lease per running agent: {project, bead, pid, engine_pid, model, acquired_at, provider}. Keyed to the agent PID (detached), so a lease survives an engine restart/resume and frees only when the real agent dies. provider selects which pool the lease counts against. |
~/.koryph/slots/demand/<project>.json |
One demand heartbeat per active engine with ready work, per pool: {project, engine_pid, updated_at, provider}. Refreshed each wave; pruned when stale (TTL) or engine_pid dead. |
All mutations happen under a short-lived flock on the slots directory (the
internal/ledger.Lock pattern), held only for the prune + count + write
(milliseconds).
Fair-share allocation (cross-process, no daemon)
Let cap = max_global_agents and demanders = sort(active demand heartbeats)
with n = len(demanders).
fairShare(p):
base = cap / n # integer division
rem = cap % n
# the first `rem` demanders (in ROTATING order) get one extra slot
order = (indexOf(p) + epochBucket) mod n # epochBucket = unixMinutes / window
return base + (1 if order < rem else 0)
Rotation of the remainder (and therefore of any zero-share turns when n > cap)
is what prevents permanent starvation: every project periodically rises into a
base+1 (or, when n > cap, a 1) slot.
Acquire (under the flock), before a dispatch:
- Prune dead-PID leases and stale/dead demand heartbeats.
- Compute
globalActive(all leases),myActive(self leases),fairShare(self). - Grant iff
globalActive < capandmyActive < fairShare(self)(strict fair share). Because the per-project shares always sum to exactlycap, a project never takes a slot reserved for another that still demands; a slow project's reserved slots wait for it rather than being lent out (no preemption). Idle capacity returns when a project drops its demand. - On grant, write the lease file; on deny, the caller defers the item (it stays
in
bd readyand is retried next wave).
Release: remove the lease when the slot reaches a terminal state. Crashed agents are reclaimed by prune (PID liveness + a TTL backstop). A project's demand heartbeat is dropped when its frontier drains or the run ends.
Engine integration
- In the wave loop (
internal/engine/wave.go), refresh this project's demand heartbeat, then dispatch is bounded bymin(perProjectWidth, cap − globalActive). Acquire a lease per item as it dispatches; a lost race → defer that item. - Release in
completeSlot/blockSlot/mergeSlot(internal/engine/poll.go). - The cost governor still runs first; the concurrency governor is an additional gate. Dispatch needs both.
Per-project override + warning
--max N/max_concurrent_slotsstill sets the per-project width, but the global cap always binds — a project can never exceedcap.- When a project's width exceeds
fairShare(self)at wave start, the engine logs:project X width N exceeds its fair share M (cap C across D projects); extra slots are used only when others are idle.
AIMD overlay: adaptive concurrency (koryph-2im.4)
governor.json's cap is static by default — the operator sets it once and it
never moves. --adaptive turns it into a congestion controller (classic AIMD:
additive increase, multiplicative decrease) so the cap floats between a
floor of 1 and an operator-set ceiling instead:
effectiveCap = adaptive ? clamp(dynamicCap, 1, hardMax) : maxGlobalAgents
koryph governor set --max-global N [--adaptive] [--hard-max M]
--max-global NseedsdynamicCap(and, with--adaptiveoff, is the fixed cap — exactly the pre-koryph-2im.4 behavior).--adaptiveenables the overlay.--hard-max Mbounds upward probing (default2×max-global) — the ceiling probing is allowed to discover, never an unbounded runaway.
Additive fields on Config (internal/govern/types.go): Adaptive,
HardMax, DynamicCap, LastDecreaseAt, LastRateLimitAt, LastProbeAt,
RateLimitEvents, plus koryph-2im.11's settle/breaker/smoothing fields (next
section). A governor.json written before these existed unmarshals them all
to zero, so Adaptive=false reproduces the old static-cap behavior
byte-for-byte (Config.EffectiveCap, internal/govern/aimd.go).
Signal. A dead agent's stream is scanned for an API rate-limit/overload
marker (429, rate_limit_error, overloaded_error) in an error-flagged
event — dispatch.ParseRateLimited (internal/dispatch/cli.go), deliberately
liberal about the exact event shape. internal/engine/poll.go's
completeSlot checks this before the commits/finishCandidate check (a
rate-limited death is never a completed candidate) and, when it fires, calls
requeueRateLimited instead of the normal requeue path.
Multiplicative decrease (Store.ReportRateLimit): dynamicCap =
max(1, effectiveCap/factor), at most once per settle window (see below) —
events inside settle still increment RateLimitEvents (observability) but do
not re-apply, so a burst of near-simultaneous rate-limited deaths across
engines on the host halves the shared cap once, not once each. factor is 2
normally, 4 on a detected burst (koryph-2im.11, next section).
Additive increase / probe (Store.EffectiveCap, evaluated lazily on every
Acquire): dynamicCap += 1 per full 5 minutes elapsed since the settle
window last expired, up to hardMax. This is what lets the cap climb past
the operator's starting width to find the real sustainable concurrency —
applyProbe anchors on max(SettleUntil, LastProbeAt) (koryph-2im.11
replaced the old LastDecreaseAt anchor with the settle deadline; see below),
so steady state is a classic AIMD sawtooth just under the true API ceiling. No
daemon: the probe advances (and persists) inside whichever engine happens to
call Acquire next.
Slot handling (internal/ledger.Slot.RateLimitRequeues, additive field):
a rate-limited death requeues WITHOUT incrementing Attempts — the failure is
environmental, not the bead's — bounded instead by its own budget (5,
internal/engine/poll.go's rateLimitedRequeueBudget), using the existing
linear backoff. Exhausting it blocks with a rate-limited requeues exhausted
note. I5 (never interrupt a running agent) holds unconditionally: the cap only
gates the next Acquire, never a live process.
Settle windows, circuit breaker, dispatch smoothing (koryph-2im.11)
AIMD alone can thrash: agents dispatched under the old cap keep triggering rate-limit events for minutes after a decrease, an instantaneous burst can need more than one halving, and several projects' refill loops reacting to the same cap raise can dispatch simultaneously (thundering herd). Three mechanisms, all Adaptive-gated (zero effect when the overlay is off) and all coordinated through the same flocked store:
- Settle window (
Config.SettleSeconds/SettleUntil, default 120s, CLI--settle-sec). After ANYDynamicCapchange — a decrease or an additive-increase step — further changes in either direction are frozen untilSettleUntil. Events arriving during settle are still counted (RateLimitEvents, and fed into the burst history below) but apply no further change: decisions are only made against a population that reflects the current cap. This subsumes and replaces the pre-L5b 60s decrease cooldown. The additive probe's quiet-clock anchors onSettleUntil, not the change's own timestamp —applyProbeininternal/govern/aimd.go. - Burst-scaled decrease (
Config.RecentRateLimitEvents, a small window-pruned history keyed by project+bead). A decrease that fires with >=3 distinct (project,bead) slots reported within the last 30s applies factor 4 instead of 2 (floored at 1) — one settle-window's worth of extra lowering up front instead of two full halve-and-wait cycles. The same slot reporting repeatedly is NOT a burst (distinct-identity count, not raw event count). - Circuit breaker (
Config.BreakerState:""closed /"open"/"half-open", plusBreakerOpenAt,BreakerBreakSeconds,BreakerReopenCount,ProbeProject/ProbeBead/ProbeAdmittedAt). Opens when a rate-limit event arrives withDynamicCapalready at the floor, or on 3 decreases within 10 minutes (Config.RecentDecreases). While open,Acquiredenies EVERY lease machine-wide (I5 holds — running agents are never touched, only new admission is refused) forConfig.BreakSeconds(default 300s, CLI--break-sec), doubling per consecutive re-open up to a 3600s ceiling. Once elapsed the breaker goes half-open: the nextAcquire(any project) is admitted as the sole probe (the flock makes this race-free — exactly one caller wins); every otherAcquireis denied until it resolves. A cleanReleaseof the probe's own lease (no rate-limit ever reported for it) closes the breaker and resumes AIMD fromDynamicCap=1, resettingBreakerReopenCount. AReportRateLimitfor the probe (or, when the reporting call site cannot supply bead identity, any report from the probe's own project — see the KNOWN GAP note ininternal/engine/govern.go'sreportRateLimit; treating an ambiguous report as a probe failure is the conservative, safe direction) re-opens with a doubled break. A probe whose lease disappears without either signal ever arriving (a crashed agent, or its owning engine dying) is resolved bypruneCrashedProbe: afterStore.ProbeTimeout(default 30 minutes) with no live lease and no resolution, it conservatively re-opens — a crashed probe cannot wedge the breaker half-open forever. - Dispatch smoothing (
Config.MinDispatchIntervalSeconds, default 3s, CLI--min-dispatch-interval). A machine-wide minimum inter-dispatch spacing, jittered ±50% per admission (Store.Jitter, deterministic- seedable for tests), enforced atAcquirein the closed state only (the breaker's own half-open probe is a single deliberate dispatch, exempt). A denial for spacing behaves exactly like a cap denial — the engine's existing refill-batch deferral handles it with no engine-loop change — and, critically, does not itself advance the spacing clock.
koryph governor show prints the operator cap, adaptive on/off, dynamic cap,
hard max, last decrease timestamp, rate-limit event count
(Store.AIMDStatus), whether settle is currently active (and its deadline),
the breaker's state (closed / open-until / half-open+probe identity), and the
smoothing interval. koryph doctor warns when the dynamic cap has sat pinned
at its floor for a long time after a real decrease, and separately when the
circuit breaker is open/half-open (calling out "flapping" once
BreakerReopenCount reaches 2 without an intervening clean close) — either
is a signal the account is being persistently rate-limited, or --hard-max/
--break-sec need attention.
Per-provider governor pools (koryph-v8u.11, L5c)
The service providers behind agent runtimes (Anthropic/claude, OpenAI/codex, Google/gemini, xAI/grok build, …) enforce independent rate limits. Every mechanism above — operator cap, leases, demand/fair-share, the AIMD overlay, settle window, circuit breaker, dispatch-smoothing clock — is per-pool, keyed by an opaque provider string:
governor.json:
{
"pools": {
"anthropic": { "max_global_agents": 8, "adaptive": true, ... },
"openai": { "max_global_agents": 4 }
}
}
- Key semantics.
DefaultPool("anthropic") is used whenever a lease, demand heartbeat, or store entry point carries no explicit provider — i.e. every behavior that existed before koryph-v8u.11. The key is deliberately opaque (not an enum) so it can later refine to"provider:account"(rate limits are really per-account within a provider) without another schema change. Every store entry point normalizes""toDefaultPoolat its boundary (govern.NormalizeProvider), so on-disk state never carries an empty pool key. - A
govern.LeasecarriesProvider, resolved today from a single hardcoded constant at the one placeinternal/engineconstructs leases (internal/engine/govern.go'sproviderAnthropic) — every agent this engine dispatches IS a claude/Anthropic session until the koryph-v8u.2 runtime adapters land and can supply the real provider. Behavior is therefore identical to pre-koryph-v8u.11 until those adapters exist. - No cross-provider shared cap, by design. Total machine concurrency is
the sum of every pool's cap — each provider's API is an independently rate
limited resource, so an Anthropic 429 must never throttle a codex agent's
admission and vice versa. (Local CPU/RAM pressure is a separate concern:
the operator's
--max-globalchoice, made per pool, is what bounds it.) - Migration. A
governor.jsonwritten before koryph-v8u.11 (any shape from koryph-1xk through koryph-2im.11 — a flat document with no top-level"pools"key) loads transparently as theanthropicpool, preserving every field (govern.File.UnmarshalJSON); it round-trips thereafter in the new{"pools": {...}}shape.internal/doctorreadsgovernor.jsondirectly (not throughStore, since it honorsOptions.Homerather thanKORYPH_HOME) and shares the same migration-aware type, so its checks see identical pool state. koryph governor set --provider P ...configures one pool (--provideromitted ⇒anthropic, full back-compat);koryph governor showandkoryph doctoriterate every pool with any live state (an explicitgovernor.jsonentry, a lease, or a demand heartbeat) — see CLI below.
CLI
koryph governor— show EVERY provider pool's cap, active leases (project/bead/pid/age), demand heartbeats, free slots, and (when configured) the AIMD overlay state including settle/breaker/smoothing, so operators can see contention per pool. A machine using only the defaultanthropicpool prints exactly one pool block — a strict superset of the pre-koryph-v8u.11 single-pool output.koryph governor set --max-global N [--provider P] [--adaptive] [--hard-max M] [--settle-sec S] [--break-sec B] [--min-dispatch-interval I]— write ONE pool's entry ingovernor.json(--provideromitted ⇒anthropic); without--adaptivethis clears/disables any previously enabled overlay ON THAT POOL ONLY (it overwrites that pool's entry wholesale — every other pool is untouched). The three L5b flags are meaningful only under--adaptiveand default (this package's documented constants) when omitted or non-positive.board/statusprune stale leases and demand (across all pools) as a hygiene side effect.
Failure & edge handling
- Stale leases: reclaimed by PID liveness (
dispatch.Alive) plus a TTL backstop, so a crash never permanently consumes a slot. n > cap: somefairSharevalues are 0 for a round; rotation guarantees each project periodically gets a turn.- Lock contention: the flock is held for microseconds; acquisition races resolve on the next re-check.
- Absent
governor.json: default cap 8 — raised to let a single self-hosting project run a wider wave; under watch for Claude API rate limiting (drop to 6 if beads get throttled). The cap only bites when total demand across projects exceeds it.
Testing
- N-goroutine
Acquire/Releaseagainst a tempKORYPH_HOME: - the cap is never exceeded under contention;
- fair share is respected across demanders;
- dead-PID leases and stale demand are reclaimed;
- rotation prevents starvation when
n > cap; - work-conserving top-up uses idle capacity when others are satisfied.
- AIMD overlay (
internal/govern/aimd_test.go): halve from various caps, floor at 1, settle suppresses a double halve, additive probe climbs past the starting cap up tohardMax, adaptive-off is byte-for-byte compatible with a pre-koryph-2im.4 store. - Settle/breaker/smoothing (
internal/govern/settle_breaker_test.go, koryph-2im.11): settle freezes both directions and the probe clock anchors on settle expiry; burst-scaled decrease (distinct-slot /4 vs same-slot /2, window pruning and bounded history); breaker opens on an at-floor event and on 3 decreases in 10 minutes, re-open duration doubles and caps at 3600s, half-open admits exactly one lease under concurrent-goroutine contention, closes on a clean probe release, re-opens on a probe rate-limit report, a crashed probe resolves via theProbeTimeoutfallback; smoothing denies a second acquire inside the jittered interval without advancing the spacing clock on denial; two independent*Storehandles over oneKORYPH_HOME(simulating two engines) keep the shared dynamic cap consistent under concurrentAcquire; adaptive-off ignores every L5b field even when hand-set to values that would otherwise gate admission. - Rate-limit stream classification fixtures
(
internal/dispatch/cli_test.go): positive (429,rate_limit_error,overloaded_errorshapes) and negative (ordinary errors, clean results, the marker appearing only in non-error-flagged text). - Engine (
internal/engine/ratelimit_test.go): a rate-limited death requeues without incrementingAttempts, capping atRateLimitRequeuesand blocking once exhausted; an ordinary death is unaffected and still burnsledger.MaxAttempts. - Doctor (
internal/doctor/doctor_test.go): the circuit-breaker check is OK when unconfigured/adaptive-off/closed, warns when open or half-open, and calls out flapping onceBreakerReopenCountreaches 2. - Per-provider pools (
internal/govern/pools_test.go, koryph-v8u.11): a rate-limit event (halve/settle/breaker trip) in one pool leaves another pool's admission, dynamic cap, and event count completely untouched (the core assertion); operator cap and lease/Release accounting are scoped per pool; a legacy pre-poolsgovernor.jsonmigrates into theanthropicpool with every AIMD field preserved and round-trips in the new shape; fair share is computed within a pool only (a project's demand in one pool never affects another project's share in a different pool);""normalizes toanthropicat every entry point (Acquire/HoldviaLease.Provider, and every other method's explicitproviderargument); concurrentAcquire/Releaseacross two pools never breaches either pool's cap (go test -race). CLI (cmd/koryph/governor_test.go):--providerset/show round-trips independently for multiple pools, and a plain (--provider-omitted)set/showstays fully back-compat with the single-pool CLI. Doctor (internal/doctor/doctor_test.go): the governor, adaptive-cap, and circuit-breaker checks each emit one Finding per pool, and zombie-lease/stale-demand Findings name the pool they belong to.
Implementation phases (each gated green)
internal/govern— lease + demand store, flock,paths.SlotsDir(),Acquire/Release/Prune/FairShare,governor.jsonload/save. Full unit tests.- Engine integration — demand heartbeat + acquire/release in the wave and poll paths; the fair-share width warning.
- CLI —
koryph governorshow/set; prune onboard/status. - AIMD overlay (koryph-2im.4) — adaptive cap, rate-limit stream classification, attempt-free requeue budget; see the dedicated section above.
- Settle windows, circuit breaker, dispatch smoothing (koryph-2im.11) —
hardens the AIMD overlay against thrashing and thundering-herd restarts;
see the dedicated section above. Known gap:
internal/engine/govern.go'sreportRateLimitcannot yet thread the dying slot's bead id through frominternal/engine/poll.go(out of bounds for that change — see the KNOWN GAP comment there), so burst-distinct-slot counting is project-level rather than per-bead until a one-line follow-up lands. - Per-provider governor pools (koryph-v8u.11, L5c) — every piece of
governor state (cap, leases, demand, AIMD overlay, settle/breaker/
smoothing) becomes per-pool, keyed by an opaque provider string; a
pre-koryph-v8u.11
governor.jsonmigrates transparently into theanthropicpool;koryph governor set --provider P/showandkoryph doctorbecome pool-aware; see the dedicated section above.internal/engine's lease construction hardcodesproviderAnthropicuntil the koryph-v8u.2 runtime adapters can supply the real provider.