Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.stacyide.xyz/llms.txt

Use this file to discover all available pages before exploring further.

Production Readiness Checklist

This checklist tracks the Phase 7 release-candidate hardening work needed before StacyVM is marketed as production-ready.

Readiness Levels

LevelTarget userCurrent gate
Internal stagingStacyOS team and trusted operatorsstacyvm doctor, CI, mock deployment smoke, documented rollback
Single-node productionTechnical self-hostersDocker/gVisor or Firecracker conformance, hardened auth, backup/restore drill
Public self-serveUsers without handholdingSigned releases, upgrade tests, support bundle, clear failure remediation
Enterprise/multi-workerInfrastructure teamsPostgres, workers, durable scheduler, leases, OIDC/RBAC

Phase 7 Acceptance Criteria

  • stacyvm doctor reports actionable local and production diagnostics.
  • Docker command execution has explicit shell and argv semantics. Done in Phase 7 slice 2.
  • File APIs have path traversal tests across manager scoping and provider boundaries. Done in Phase 7 final cleanup.
  • Sensitive operations are covered by persisted operation audit records. Done in Phase 7 final cleanup.
  • Runtime certification scripts exist for Docker, gVisor, Kata, Firecracker, and PRoot host checks. Done in Phase 7 final cleanup.
  • Threat model is documented for runtime, API, admin, live-preview, pool, and registry surfaces.
  • Release notes describe verified CI and known platform caveats.

Phase 8 Acceptance Criteria

  • SQLite backup and restore are available through the CLI with integrity checks and restore safety copies. Done in Phase 8 slice 1.
  • Production config linting is available through stacyvm config lint --production and can run against explicit config files without requiring Docker/KVM host access. Done in Phase 8 slice 2.
  • Upgrade rehearsal checks document backup, config lint, service restart, readiness validation, and rollback. Done in Phase 8 slice 3.
  • Support bundle export exists and redacts secrets before sharing with maintainers. Done in Phase 8 slice 3.

Phase 9 Acceptance Criteria

  • Release binaries and checksums are signed through the GitHub Actions release workflow. Done in Phase 9 slice 1.
  • Published container image digests are signed through the GitHub Actions release workflow. Done in Phase 9 slice 1.
  • A public verification script exists for release signatures and checksums. Done in Phase 9 slice 1.
  • Installer supports Sigstore verification and a fail-closed mode. Done in Phase 9 slice 1.
  • Upgrade and config migration tests run in CI. Done in Phase 9 slice 2.
  • Public docs expose known limitations and exact remediation paths. Done in Phase 9 slice 2.
  • Public release sanity builds and checksum verification run in CI. Done in Phase 9 final polish.
  • SDK parity smoke tests run in CI without requiring a live runtime. Done in Phase 9 final polish.
  • GitHub issue templates request support bundle, config lint, upgrade rehearsal, runtime certification, and release verification evidence. Done in Phase 9 final polish.

Phase 10 Acceptance Criteria

  • Worker registration and heartbeat records are stored durably. Done in Phase 10 slice 1.
  • Single-node servers self-register as the local worker with provider and capacity metadata. Done in Phase 10 slice 1.
  • Single-node servers refresh the local worker heartbeat while running. Done in Phase 10 heartbeat slice.
  • Read-only worker discovery is available through the normal API. Done in Phase 10 slice 1.
  • Worker heartbeat and deletion are protected by the admin namespace. Done in Phase 10 slice 1.
  • Diagnostics and Prometheus expose worker registry state. Done in Phase 10 slice 1.
  • Sandbox records persist their owning worker ID and diagnostics expose sandbox counts by worker. Done in Phase 10 slice 2.
  • Scheduler placement policy is worker-aware. Remote spawn, status, destroy, live exec streaming, files, logs, preview metadata, and conservative drain/offline ownership policy are available for workers that advertise rpc_url.
  • Sandbox ownership is tied to worker IDs. Remote spawn/status/destroy ownership is enforced through worker RPC and persisted runtime IDs.
  • Distributed leases prevent duplicate worker ownership. Remote spawn, renew, and destroy now carry lease tokens; persistence now has SQLite and Postgres store paths with Postgres lease race coverage.
  • Remote worker authentication and RPC contract are implemented for heartbeat, lease renewal, spawn, status, destroy, exec, files, logs, preview metadata, and drain/offline ownership reconciliation. Shared worker tokens remain available for staging, and per-worker token mapping now supports individually rotatable worker credentials.

Current Release-Candidate Gates

GateStatusNotes
Full Go test suitePassingCI runs make test.
Web buildPassingCI runs npm run build.
SDK checksPassingTypeScript builds, Python imports, and mock-based SDK parity smoke tests run in CI.
Deployment smokePassingMock-provider smoke is in CI. Docker live host certification remains external.
Cluster conformancePartialAlways-on CI covers SQLite store contract, live Postgres store contract, Postgres migration rehearsal, Postgres lease concurrency, per-worker and signed worker identity, worker identity certification reporting, production cluster config lint, and Postgres-backed remote worker smoke. See docs/cluster-conformance.md.
Runtime conformancePartialHarness and host certification script exist; Firecracker/PRoot remain platform-gated.
Security postureStrongAdmin governance, operation audit, path traversal checks, explicit exec modes, OIDC/JWT RS256+ES256 auth with RBAC, real SHA256 hash in RS256 verification, admin routes protected in OIDC-only mode, tenant/project model, per-tenant audit, policy enforcement on spawn, policy controls for providers/images/networks, and hardened centralized worker token issuer are implemented.
Release automationPassingRelease workflow signs binaries, checksums, and GHCR image digests; public verifier and installer verification exist.
Worker registryNear-completeDurable worker registration, heartbeat, diagnostics, metrics, placement, ownership, leases, per-worker token auth, signed worker identity, centralized token issuance, worker RPC routing, and worker RPC mTLS wiring exist. Remaining: target-network mTLS smoke with deployment-issued certificates.
Enterprise/OIDCPassingOIDC/JWT RS256 verification, RBAC roles (viewer/operator/admin/tenant_admin), OIDC group→role mapping, tenant model, per-tenant audit, and policy enforcement are implemented.
Public API exposurePassingCORS origins are configurable through server.cors_allowed_origins; production config lint fails wildcard or empty CORS before public exposure.

Required Before Single-Node Production

  • Production config uses distinct API and admin keys.
  • server.cors_allowed_origins contains only exact trusted https:// origins for public browser clients.
  • auth.admin_fallback_enabled is false.
  • auth.admin_audit_retention is set to a production window.
  • Docker provider runs with explicit runtime, network mode, dropped caps, pid limit, memory, CPU, and seccomp settings.
  • Firecracker hosts pass Linux/KVM conformance before being marked production.
  • Backup and restore are tested against the SQLite database.
  • stacyvm config lint --production passes with the same config and environment variables the service will use.
  • stacyvm upgrade rehearse passes before binary/image replacement.
  • Operators can generate stacyvm support bundle output without exposing API keys or provider secrets.
  • Runtime certification artifacts are generated on the actual host with scripts/certify-runtime.sh <runtime> --format markdown --output <runtime>-certification.md.
  • Operators run stacyvm doctor --production before go-live.

Required Before Public Self-Serve

  • Release artifacts are signed and checksummed.
  • Upgrade and config migration tests run in CI.
  • stacyvm doctor includes remediation links for every failure.
  • Support bundle export exists and redacts secrets.
  • Threat model is reviewed for each release candidate.
  • Known limitations are visible in README, docs, and release notes.
  • Public support expectations are documented in public-support-matrix.
  • Bug and production support issue templates ask for the same evidence required by the public support matrix.
  • Public release sanity CI builds release binaries and validates checksums; real GitHub release asset verification must be repeated after each version tag is published.
  • Public browser clients use explicit CORS origins; wildcard CORS must fail stacyvm config lint --production.
  • Final public evidence is generated with scripts/public-readiness-evidence.sh; announcement requires a PUBLIC SELF-SERVE READY verdict for the release tag, target host runtime, and deployment network.

Required Before Enterprise/Multi-Worker

  • Postgres store implementation. Driver, migrations, contract path, migration rehearsal, lease race coverage, and mock-provider remote worker smoke exist. Backup rehearsal is available via stacyvm db pg-rehearse and stacyvm db pg-backup. Done in Phase 14.
  • Worker registration and heartbeat model. Durable registry, per-worker token auth, signed worker tokens, issuer/rotation workflow, centralized token issuance via /api/v1/admin/worker-tokens, worker identity certification reporting, and worker RPC mTLS wiring exist. Target-network mTLS smoke with deployment-issued certificates remains pending for specific enterprise networks.
  • Scheduler abstraction with placement policy. Done in Phase 10.
  • Durable queue/pub-sub for lifecycle events. Covered by EventBus and persisted lease model.
  • Distributed leases to prevent double ownership. Done in Phase 10.
  • OIDC/SSO and RBAC implemented. RS256 JWT Bearer token validation with configurable OIDC issuer, JWKS URL, audience, and group-to-role mapping is implemented. Roles: viewer, operator, admin, tenant_admin, worker. Done in Phase 14.
  • Tenant/project model implemented. Tenants, tenant members with RBAC roles, policy controls (image/provider/network allow-deny lists), per-tenant audit export, and admin UI management are implemented. Done in Phase 14.
  • Worker RPC transport enforces worker-rpc-contract. Done in Phase 11-13.

Phase 14 Acceptance Criteria

  • OIDC/JWT RS256 Bearer token validation with configurable issuer, JWKS URL, audience, groups claim, and group-to-role mapping. Done.
  • RBAC roles beyond admin/api: viewer, operator, tenant_admin with scoped permissions. Done.
  • Tenant/project model: tenant CRUD, member RBAC, per-tenant resource scoping. Done.
  • Per-tenant audit export: admin and operation audit logs scoped by tenant_id. Done.
  • Policy controls: per-tenant allow/deny policies for image, provider, and network resources. Done.
  • Centralized worker token issuer: admin API endpoint mints signed worker tokens so workers do not need direct signing key access. Done.
  • Postgres backup: stacyvm db pg-backup wraps pg_dump for production cluster snapshots. Done.
  • Postgres migration rehearsal: stacyvm db pg-rehearse verifies schema state before upgrades. Done.
  • Admin UI tenant management: Tenants page with member RBAC and policy management. Done.
  • Worker RPC mTLS smoke with deployment-issued certificates in target enterprise network: Pending (external to code).