Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.stacyide.xyz/llms.txt

Use this file to discover all available pages before exploring further.

Firecracker Snapshot/Restore: 1,165ms to 28ms

The Numbers

MetricBeforeAfterImprovement
Single spawn1,165ms28ms41x faster
3 concurrent spawns3,495ms sequential43ms wall clock81x faster
For context, here’s how this compares to production sandbox providers:
ProviderCold Start
AWS Lambda~200-500ms
E2B~300-600ms
Fly Machines~300ms
Modal~100-200ms
StacyVM28ms

Where the Time Was Going

BEFORE (cold boot = 1,165ms)
==============================
resolve image         0.1ms   ░
copy rootfs            10ms   █
start firecracker       6ms   █
configure VM (4 API)    1ms   ░
InstanceStart           8ms   █
wait for agent      1,140ms   ████████████████████████████████████████████████  ← 97.8%
Nearly all of it was the Linux kernel booting (4.14), mounting the rootfs, and the guest agent starting up. The Firecracker process itself is instant. The host-side setup is instant. The bottleneck is always inside the VM.

How Snapshot/Restore Eliminates It

AFTER (snapshot restore = 28ms)
================================
copy rootfs             7ms   ████████
start firecracker       5ms   ██████
snapshot load           4ms   █████
agent reconnect        12ms   ███████████████
                              ─────────────────
                              28ms total
The kernel never boots. The agent never starts. The VM resumes from a paused state where everything is already initialized.

How It Works

The Golden Image Pattern

Once per image (background, ~1.5s):
  1. Boot a temporary VM from the rootfs
  2. Wait for the guest agent to become ready
  3. Pause the VM (PATCH /vm {"state": "Paused"})
  4. Snapshot: PUT /snapshot/create produces vmstate.bin (CPU/device state, 16KB) + memory.bin (full RAM, 512MB)
  5. Kill the temporary VM, keep the snapshot files
Every spawn after that (28ms):
  1. Sparse-copy the snapshot’s clean rootfs to a new sandbox directory
  2. Start a fresh Firecracker process (no VM config needed)
  3. PUT /snapshot/load with resume_vm: true — VM resumes instantly
  4. Connect to the already-running guest agent over vsock

The Relative Path Trick

Firecracker bakes drive and vsock paths into the snapshot state. You cannot reconfigure them before or after loading a snapshot. This seems like a dealbreaker for running multiple VMs from one snapshot. The solution: use relative paths during snapshot creation.
// During snapshot creation:
api.put("/drives/rootfs", {"path_on_host": "rootfs.ext4"})   // relative
api.put("/vsock", {"uds_path": "v.sock"})                     // relative
Each Firecracker process runs with cmd.Dir set to its sandbox directory. When a snapshot is loaded, Firecracker resolves rootfs.ext4 and v.sock against its working directory. Each restored VM gets its own rootfs and vsock automatically — no path conflicts, no reconfiguration needed.

Snapshot Storage

/var/lib/stacyvm/snapshots/{sha256-of-rootfs-path}/
  vmstate.bin    16 KB    CPU registers, device state, interrupt controllers
  memory.bin    512 MB    Full guest RAM (could use diff snapshots later)
  rootfs.ext4    64 MB    Clean baseline for sparse-copying

Concurrency

Each restore is fully independent:
  • Own Firecracker process
  • Own sandbox directory with own rootfs copy
  • Own vsock UDS (relative path resolves to own dir)
  • Own CID from atomic counter
Three concurrent spawns measured at 28ms, 29ms, 33ms. The snapshot files are read-only after creation — no locking needed.

What We Learned the Hard Way

  1. /vm is PATCH, not PUT. Firecracker returns 400 with a cryptic “Invalid request method” error if you use PUT to pause/resume.
  2. No drive/vsock config with snapshots. Firecracker rejects snapshot load if you configured any “boot-specific resources” beforehand. The error message (“Loading a microVM snapshot not allowed after configuring boot-specific resources”) doesn’t tell you that you also can’t configure them after load.
  3. Relative paths are the key. Every production Firecracker deployment using snapshots needs this trick. The snapshot state stores the exact paths used during creation. Relative paths + per-sandbox working directories = unlimited concurrent restores from one snapshot.

Files Changed

FileWhat
internal/providers/firecracker.gosnapshotInfo, createBaseSnapshot(), restoreFromSnapshot(), getSnapshot(), modified Spawn()
internal/providers/firecracker_api.goAdded patch() method for PATCH requests
Zero changes to the provider interface, API routes, guest agent, or any other file. Snapshot/restore is a fully transparent optimization inside the Firecracker provider.