> ## Documentation Index
> Fetch the complete documentation index at: https://docs.stacyide.xyz/llms.txt
> Use this file to discover all available pages before exploring further.

# Production readiness

# Production Readiness Checklist

This checklist tracks the Phase 7 release-candidate hardening work needed before StacyVM is marketed as production-ready.

## Readiness Levels

| Level                   | Target user                        | Current gate                                                                  |
| ----------------------- | ---------------------------------- | ----------------------------------------------------------------------------- |
| Internal staging        | StacyOS team and trusted operators | `stacyvm doctor`, CI, mock deployment smoke, documented rollback              |
| Single-node production  | Technical self-hosters             | Docker/gVisor or Firecracker conformance, hardened auth, backup/restore drill |
| Public self-serve       | Users without handholding          | Signed releases, upgrade tests, support bundle, clear failure remediation     |
| Enterprise/multi-worker | Infrastructure teams               | Postgres, workers, durable scheduler, leases, OIDC/RBAC                       |

## Phase 7 Acceptance Criteria

* `stacyvm doctor` reports actionable local and production diagnostics.
* Docker command execution has explicit shell and argv semantics. Done in Phase 7 slice 2.
* File APIs have path traversal tests across manager scoping and provider boundaries. Done in Phase 7 final cleanup.
* Sensitive operations are covered by persisted operation audit records. Done in Phase 7 final cleanup.
* Runtime certification scripts exist for Docker, gVisor, Kata, Firecracker, and PRoot host checks. Done in Phase 7 final cleanup.
* Threat model is documented for runtime, API, admin, live-preview, pool, and registry surfaces.
* Release notes describe verified CI and known platform caveats.

## Phase 8 Acceptance Criteria

* SQLite backup and restore are available through the CLI with integrity checks and restore safety copies. Done in Phase 8 slice 1.
* Production config linting is available through `stacyvm config lint --production` and can run against explicit config files without requiring Docker/KVM host access. Done in Phase 8 slice 2.
* Upgrade rehearsal checks document backup, config lint, service restart, readiness validation, and rollback. Done in Phase 8 slice 3.
* Support bundle export exists and redacts secrets before sharing with maintainers. Done in Phase 8 slice 3.

## Phase 9 Acceptance Criteria

* Release binaries and checksums are signed through the GitHub Actions release workflow. Done in Phase 9 slice 1.
* Published container image digests are signed through the GitHub Actions release workflow. Done in Phase 9 slice 1.
* A public verification script exists for release signatures and checksums. Done in Phase 9 slice 1.
* Installer supports Sigstore verification and a fail-closed mode. Done in Phase 9 slice 1.
* Upgrade and config migration tests run in CI. Done in Phase 9 slice 2.
* Public docs expose known limitations and exact remediation paths. Done in Phase 9 slice 2.
* Public release sanity builds and checksum verification run in CI. Done in Phase 9 final polish.
* SDK parity smoke tests run in CI without requiring a live runtime. Done in Phase 9 final polish.
* GitHub issue templates request support bundle, config lint, upgrade rehearsal, runtime certification, and release verification evidence. Done in Phase 9 final polish.

## Phase 10 Acceptance Criteria

* Worker registration and heartbeat records are stored durably. Done in Phase 10 slice 1.
* Single-node servers self-register as the `local` worker with provider and capacity metadata. Done in Phase 10 slice 1.
* Single-node servers refresh the `local` worker heartbeat while running. Done in Phase 10 heartbeat slice.
* Read-only worker discovery is available through the normal API. Done in Phase 10 slice 1.
* Worker heartbeat and deletion are protected by the admin namespace. Done in Phase 10 slice 1.
* Diagnostics and Prometheus expose worker registry state. Done in Phase 10 slice 1.
* Sandbox records persist their owning worker ID and diagnostics expose sandbox counts by worker. Done in Phase 10 slice 2.
* Scheduler placement policy is worker-aware. Remote spawn, status, destroy, live exec streaming, files, logs, preview metadata, and conservative drain/offline ownership policy are available for workers that advertise `rpc_url`.
* Sandbox ownership is tied to worker IDs. Remote spawn/status/destroy ownership is enforced through worker RPC and persisted runtime IDs.
* Distributed leases prevent duplicate worker ownership. Remote spawn, renew, and destroy now carry lease tokens; persistence now has SQLite and Postgres store paths with Postgres lease race coverage.
* Remote worker authentication and RPC contract are implemented for heartbeat, lease renewal, spawn, status, destroy, exec, files, logs, preview metadata, and drain/offline ownership reconciliation. Shared worker tokens remain available for staging, and per-worker token mapping now supports individually rotatable worker credentials.

## Current Release-Candidate Gates

| Gate                | Status        | Notes                                                                                                                                                                                                                                                                                                                                                                                   |
| ------------------- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Full Go test suite  | Passing       | CI runs `make test`.                                                                                                                                                                                                                                                                                                                                                                    |
| Web build           | Passing       | CI runs `npm run build`.                                                                                                                                                                                                                                                                                                                                                                |
| SDK checks          | Passing       | TypeScript builds, Python imports, and mock-based SDK parity smoke tests run in CI.                                                                                                                                                                                                                                                                                                     |
| Deployment smoke    | Passing       | Mock-provider smoke is in CI. Docker live host certification remains external.                                                                                                                                                                                                                                                                                                          |
| Cluster conformance | Partial       | Always-on CI covers SQLite store contract, live Postgres store contract, Postgres migration rehearsal, Postgres lease concurrency, per-worker and signed worker identity, worker identity certification reporting, production cluster config lint, and Postgres-backed remote worker smoke. See `docs/cluster-conformance.md`.                                                          |
| Runtime conformance | Partial       | Harness and host certification script exist; Firecracker/PRoot remain platform-gated.                                                                                                                                                                                                                                                                                                   |
| Security posture    | Strong        | Admin governance, operation audit, path traversal checks, explicit exec modes, OIDC/JWT RS256+ES256 auth with RBAC, real SHA256 hash in RS256 verification, admin routes protected in OIDC-only mode, tenant/project model, per-tenant audit, policy enforcement on spawn, policy controls for providers/images/networks, and hardened centralized worker token issuer are implemented. |
| Release automation  | Passing       | Release workflow signs binaries, checksums, and GHCR image digests; public verifier and installer verification exist.                                                                                                                                                                                                                                                                   |
| Worker registry     | Near-complete | Durable worker registration, heartbeat, diagnostics, metrics, placement, ownership, leases, per-worker token auth, signed worker identity, centralized token issuance, worker RPC routing, and worker RPC mTLS wiring exist. Remaining: target-network mTLS smoke with deployment-issued certificates.                                                                                  |
| Enterprise/OIDC     | Passing       | OIDC/JWT RS256 verification, RBAC roles (viewer/operator/admin/tenant\_admin), OIDC group→role mapping, tenant model, per-tenant audit, and policy enforcement are implemented.                                                                                                                                                                                                         |
| Public API exposure | Passing       | CORS origins are configurable through `server.cors_allowed_origins`; production config lint fails wildcard or empty CORS before public exposure.                                                                                                                                                                                                                                        |

## Required Before Single-Node Production

* Production config uses distinct API and admin keys.
* `server.cors_allowed_origins` contains only exact trusted `https://` origins for public browser clients.
* `auth.admin_fallback_enabled` is `false`.
* `auth.admin_audit_retention` is set to a production window.
* Docker provider runs with explicit runtime, network mode, dropped caps, pid limit, memory, CPU, and seccomp settings.
* Firecracker hosts pass Linux/KVM conformance before being marked production.
* Backup and restore are tested against the SQLite database.
* `stacyvm config lint --production` passes with the same config and environment variables the service will use.
* `stacyvm upgrade rehearse` passes before binary/image replacement.
* Operators can generate `stacyvm support bundle` output without exposing API keys or provider secrets.
* Runtime certification artifacts are generated on the actual host with `scripts/certify-runtime.sh <runtime> --format markdown --output <runtime>-certification.md`.
* Operators run `stacyvm doctor --production` before go-live.

## Required Before Public Self-Serve

* Release artifacts are signed and checksummed.
* Upgrade and config migration tests run in CI.
* `stacyvm doctor` includes remediation links for every failure.
* Support bundle export exists and redacts secrets.
* Threat model is reviewed for each release candidate.
* Known limitations are visible in README, docs, and release notes.
* Public support expectations are documented in [public-support-matrix](/docs/public-support-matrix).
* Bug and production support issue templates ask for the same evidence required by the public support matrix.
* Public release sanity CI builds release binaries and validates checksums; real GitHub release asset verification must be repeated after each version tag is published.
* Public browser clients use explicit CORS origins; wildcard CORS must fail `stacyvm config lint --production`.
* Final public evidence is generated with `scripts/public-readiness-evidence.sh`; announcement requires a **PUBLIC SELF-SERVE READY** verdict for the release tag, target host runtime, and deployment network.

## Required Before Enterprise/Multi-Worker

* Postgres store implementation. Driver, migrations, contract path, migration rehearsal, lease race coverage, and mock-provider remote worker smoke exist. Backup rehearsal is available via `stacyvm db pg-rehearse` and `stacyvm db pg-backup`. Done in Phase 14.
* Worker registration and heartbeat model. Durable registry, per-worker token auth, signed worker tokens, issuer/rotation workflow, centralized token issuance via `/api/v1/admin/worker-tokens`, worker identity certification reporting, and worker RPC mTLS wiring exist. Target-network mTLS smoke with deployment-issued certificates remains pending for specific enterprise networks.
* Scheduler abstraction with placement policy. Done in Phase 10.
* Durable queue/pub-sub for lifecycle events. Covered by EventBus and persisted lease model.
* Distributed leases to prevent double ownership. Done in Phase 10.
* OIDC/SSO and RBAC implemented. RS256 JWT Bearer token validation with configurable OIDC issuer, JWKS URL, audience, and group-to-role mapping is implemented. Roles: `viewer`, `operator`, `admin`, `tenant_admin`, `worker`. Done in Phase 14.
* Tenant/project model implemented. Tenants, tenant members with RBAC roles, policy controls (image/provider/network allow-deny lists), per-tenant audit export, and admin UI management are implemented. Done in Phase 14.
* Worker RPC transport enforces [worker-rpc-contract](/docs/worker-rpc-contract). Done in Phase 11-13.

## Phase 14 Acceptance Criteria

* OIDC/JWT RS256 Bearer token validation with configurable issuer, JWKS URL, audience, groups claim, and group-to-role mapping. Done.
* RBAC roles beyond admin/api: viewer, operator, tenant\_admin with scoped permissions. Done.
* Tenant/project model: tenant CRUD, member RBAC, per-tenant resource scoping. Done.
* Per-tenant audit export: admin and operation audit logs scoped by tenant\_id. Done.
* Policy controls: per-tenant allow/deny policies for image, provider, and network resources. Done.
* Centralized worker token issuer: admin API endpoint mints signed worker tokens so workers do not need direct signing key access. Done.
* Postgres backup: `stacyvm db pg-backup` wraps pg\_dump for production cluster snapshots. Done.
* Postgres migration rehearsal: `stacyvm db pg-rehearse` verifies schema state before upgrades. Done.
* Admin UI tenant management: Tenants page with member RBAC and policy management. Done.
* Worker RPC mTLS smoke with deployment-issued certificates in target enterprise network: Pending (external to code).
