Cluster & node upgrades
kcore nodes run NixOS. Host upgrades move the system forward using an immutable flake revision (flake_ref + flake_rev) while the controller orchestrates prepare/activate work on each node. Workloads (VMs/containers) use the normal lifecycle APIs; they are not upgraded by this flow.
Cluster rollout vs single node
Both use the same kind: ClusterUpdate manifest. The selector decides scope:
- Whole cluster — e.g.
selector.all_nodes: true(subject to labels/datacenter filters when you use them). - Controllers only —
selector.controllers_only: true. - One or few nodes —
selector.node_ids: ["<uuid>", …]for a targeted or rolling-by-hand rollout.
There is no separate command for “upgrade this machine only”; narrow the selector instead.
CLI (kctl update cluster)
Alias: kctl update os. Requires a working controller context (same TLS setup as other kctl commands).
# Dry-run: resolve targets, viability, reboot hints
kctl update cluster plan -f rollout.yaml
# Submit or update the ClusterUpdate resource
kctl update cluster apply -f rollout.yaml
# Observe cluster + per-node phases
kctl update cluster get release-0-4-0
kctl update cluster list
# If approval_policy is manual
kctl update cluster approve release-0-4-0
kctl update cluster cancel release-0-4-0
kctl update cluster rollback release-0-4-0
Switch to CLI in the strip above for command blocks.
YAML manifest outline
Use camelCase fields; see YAML manifest reference (ClusterUpdate).
kind: ClusterUpdate
metadata:
name: release-0-4-0
spec:
target:
version: "0.4.0"
flake_ref: github:kcorehypervisor/kcore
flake_rev: "<full-git-sha>"
selector:
all_nodes: true
strategy:
strategy_type: one-at-a-time
max_unavailable: 1
batch_size: 1
drain_vms: false
activation:
mode: auto
approval_policy: manual
With approval_policy: manual, the controller waits for kctl update cluster approve <name> before rolling. Automatic policies advance without that step.
Phases
The cluster resource exposes a phase (pending, rolling_out, succeeded, failed, …). Each node row tracks prepare/activate progress and errors. Use kctl update cluster get for detail.
Strategy knobs (canary, batch, …) exist on the API; simpler deployments may advance nodes one after another — confirm behaviour for your release.
Host rollback
Controller rollback expresses rollout intent. For a broken activation you can still recover with normal NixOS generation commands on the machine (e.g. previous bootloader generation); keep that in your physical/datacenter runbook.
Further reading
- cluster-and-node-upgrades.md (operator guide in the repo)
- cluster-updates.md — architecture and design history