Documentation Index
Fetch the complete documentation index at: https://docs.stacyide.xyz/llms.txt
Use this file to discover all available pages before exploring further.
Phase 10 Multi-Worker Foundation Release Notes
Date: 2026-05-09 Branch:phase-10-multi-worker-foundation
Summary
Phase 10 starts the enterprise and multi-worker production track. This slice adds the durable worker registry foundation StacyVM needs before scheduler placement, worker ownership, leases, and remote worker RPC can be made production-grade. This is not a full distributed runtime yet. It is the first production-aligned control-plane layer for observing workers, recording heartbeats, and exposing that state through APIs, diagnostics, and metrics.What Changed
Worker Registry Storage
- Added a SQLite migration for the
workerstable. - Added durable worker fields for ID, hostname, status, providers, capabilities, capacity, heartbeat timestamp, and lifecycle timestamps.
- Added store methods for saving, fetching, listing, and deleting worker records.
Local Worker Registration
- The API server now registers the current process as the
localworker at startup. - The API server now refreshes the
localworker heartbeat periodically while running. - Server shutdown stops the heartbeat loop cleanly.
- The local record includes configured providers, single-node capabilities, and manager capacity limits.
- Single-node deployments now appear in the same worker registry surface that future multi-worker deployments will use.
Worker API
- Added read-only worker discovery:
GET /api/v1/workersGET /api/v1/workers/{workerID}
- Added admin-only worker mutations:
POST /api/v1/admin/workers/{workerID}/heartbeatDELETE /api/v1/admin/workers/{workerID}
- Worker responses include a computed
staleflag when the last heartbeat is older than the freshness window.
Sandbox Worker Ownership
- Added persisted
worker_idownership to sandbox records. - New and adopted local sandboxes are stamped with the active worker ID.
- Scheduler status now reports the current worker ID.
- Sandbox API responses now include
worker_idwhen ownership is known.
Worker-Aware Scheduler Placement
- Spawn admission now evaluates worker placement using worker status, heartbeat freshness, provider support, declared capacity, and active sandbox counts.
- Scheduler status now reports the selected worker and number of eligible workers.
- Local execution remains honest: if the scheduler would place work on a remote worker, admission reports
remote_worker_rpc_unavailableuntil the worker RPC slice lands. - Stale local worker records are no longer special-cased; real server runs keep the local worker fresh through the heartbeat loop.
Distributed Lease Foundation
- Added durable lease records for resource ownership fencing.
- Added store APIs to acquire, renew, release, get, and list leases.
- Lease acquisition is holder-aware and expiry-aware: a competing worker cannot acquire an unexpired lease held by another worker.
- Lease renewals require the current holder and an unexpired lease.
- Diagnostics and Prometheus now expose lease totals so operators can inspect active and expired lease state.
Lease Enforcement
- Local spawns now acquire a sandbox lease before persisting the sandbox record.
- Runtime adoption during reconciliation now acquires a sandbox lease before adopting unknown provider runtimes.
- Pool VM and pooled logical sandbox creation now acquire leases.
- Destroy now acquires or renews the local worker lease before mutating provider/runtime/store state.
- Successful destroy releases the sandbox lease.
- Wrong-holder lease tests now prevent local destroy from mutating a sandbox owned by another worker.
Worker RPC Contract And Auth Model
- Added
internal/workerprotowith transport-neutral worker request and response envelopes. - Defined contract methods for heartbeat, spawn, destroy, status, lease renewal, and shutdown.
- Mutating worker assignments require a lease token in the message contract.
- Added transport-neutral worker auth claims and initial scopes.
- Documented the worker trust boundary, suggested headers, lease fencing rules, and Postgres cluster-store guarantees in
docs/worker-rpc-contract.md. - Remote worker execution remains gated until a network transport enforces this contract.
Diagnostics And Metrics
- Diagnostics now include worker totals, online count, stale count, unhealthy count, and worker items.
- Diagnostics now include lease totals, active count, expired count, and active leases by holder.
- Diagnostics sandbox summaries now include
by_workercounts. - Prometheus output now includes:
stacyvm_workers_total{status="total"}stacyvm_workers_total{status="online"}stacyvm_workers_total{status="stale"}stacyvm_workers_total{status="unhealthy"}stacyvm_leases_total{status="active"}stacyvm_sandboxes_by_worker_total{worker="local"}
Documentation
- Updated the changelog with Phase 10 changes.
- Updated the API reference with worker endpoints and metrics.
- Updated the README endpoint table with worker discovery.
- Updated the production readiness checklist with Phase 10 acceptance criteria.
Code Areas
internal/store: worker model, lease model, migrations, SQLite CRUD, and sandboxworker_idpersistence.internal/workerproto: worker RPC contract and auth claim types.internal/api/routes: worker routes, diagnostics worker summary, and Prometheus worker metrics.internal/api/server.go: local worker startup registration, heartbeat refresh loop, and route mounting.docs: API, README, changelog, production readiness, and release notes.
Verification
go test ./internal/store ./internal/api/routes ./internal/apiscripts/check-swagger.shgo test ./...git diff --check

