Storage day-2 operations
After initial install you may need to add data disks, expand storage, or reconfigure layouts. kcore exposes day-2 disk changes as declarative DiskLayout resources owned by the controller, with a per-node kctl node apply-disk escape hatch for one-off pushes and validation.
Naming: the operator surface uses the plain word disk (DiskLayout, kctl … disk-layout, /etc/kcore/disk-management-mode). The underlying partitioning tool is still disko and is referenced as such in logs and the kcore.disko.* NixOS module — only the user-facing names changed.
Safety contract
The controller never touches running VMs. It does not drain, stop, migrate, or reboot workloads. The operator empties the node (manually today; via live migration once that lands) before submitting a DiskLayout that would touch a disk currently in use.
Every apply runs through a two-stage classifier:
- Controller pre-flight — fast structural check on the submitted layout (extracts target devices from the Nix body, rejects empty/malformed layouts). Used by
kctl diffand onkctl apply. - Node-agent authoritative gate — runs
lsblkon the target node and refuses any layout whose target device (or any descendant partition) currently hosts active state. The node-agent always has the final say.
When the node-agent refuses, it surfaces a stable, machine-readable code on status.refusalReason so kctl and dashboards can key UX off it instead of parsing prose:
| Refusal code | Meaning |
|---|---|
target_device_has_active_kcore_volume | A partition under the target device is mounted at /var/lib/kcore/volumes (or /var/lib/kcore/images) — i.e. it currently backs a VM volume or image cache. |
target_device_has_active_system_mount | A partition under the target device hosts /, /boot, /boot/efi, /nix, or /nix/store. |
target_device_is_active_lvm_pv | The target device is an active LVM physical volume (fstype = LVM2_member). |
target_device_is_active_zpool_member | The target device is a member of an active ZFS pool (fstype = zfs_member). |
no_target_devices | The submitted layout did not declare any /dev/* target devices. |
lsblk_probe_failed | The node-agent could not snapshot live disk state. Fail-closed by design. |
There is no --force override. To clear a refusal the operator quiesces the affected workloads and resubmits the same manifest — the reconciler retries on the next tick.
Management modes
The file /etc/kcore/disk-management-mode on each node gates day-2 disk apply. The legacy path /etc/kcore/disko-management-mode is still read as a fallback for one release; node install writes the new path and a compatibility symlink so existing setups keep working.
| Mode | Behaviour |
|---|---|
| installer-only (default) | Validation-only flows are allowed; --apply is rejected with a clear error reporting the active mode. This is the default after node install so freshly installed nodes cannot be re-partitioned by accident. |
| controller-managed | The controller's reconciler may dispatch ApplyDiskLayout RPCs to the node-agent. The classifier still has to declare each apply safe; promotion does not lower the safety bar. |
Promote a node to controller-managed mode explicitly when the runbook and maintenance window are in place:
echo controller-managed | sudo tee /etc/kcore/disk-management-mode
Recommended workflow: declarative DiskLayout
Submit a YAML manifest with kind: DiskLayout to the controller. The node-agent still applies disko using a Nix body that defines disko.devices; kctl can build that body for you from structured YAML so you do not hand-author Nix for common data-disk layouts.
Exactly one layout source: set either spec.diskLayout (structured YAML: disks, GPT partitions, partition contents), or inline spec.layoutNix: |, or spec.layoutNixFile: relative/path.nix (path resolved next to the manifest). Mixing more than one is rejected.
Preferred — spec.diskLayout: describes whole disks, GPT partitions, and each partition’s role. Supported partition content.type values include filesystem (with format and mountpoint), lvm_pv (with vg), and zfs (with pool). Optional lvmVolumeGroups / zfsPools lists declare empty stubs when needed. kctl expands this to the same disko.devices Nix the controller stores and the reconciler pushes to the node.
# day2-disk-layout.yaml — structured layout (typical)
kind: DiskLayout
metadata:
name: prod-data-pool
spec:
nodeId: kvm-node-192-168-40-105 # controller node id for the target machine
diskLayout:
disks:
- name: data1
device: /dev/nvme1n1 # adjust to match lsblk on the node
gpt:
partitions:
- name: kcore0
size: "100%"
content:
type: filesystem
format: ext4
mountpoint: /var/lib/kcore/volumes1
For LVM or ZFS member partitions, use content: { type: lvm_pv, vg: vg_kcore } or content: { type: zfs, pool: tank0 }, and list empty stubs if required:
diskLayout:
lvmVolumeGroups:
- name: vg_kcore
zfsPools:
- name: tank0
disks: [ ... ]
Field names use camelCase under diskLayout (for example lvmVolumeGroups). Shape and options follow what kctl accepts; see also the YAML manifest reference for a field summary.
Advanced — raw disko Nix: use layoutNix: | when you need disko features not covered by the YAML schema yet, or layoutNixFile: when the fragment is large or shared.
# Inline disko Nix (advanced)
kind: DiskLayout
metadata:
name: prod-data-pool
spec:
nodeId: kvm-node-192-168-40-105
layoutNix: |
{
disko.devices.disk.data1 = {
type = "disk";
device = "/dev/nvme1n1";
content = {
type = "gpt";
partitions.data = {
size = "100%";
content = {
type = "filesystem";
format = "ext4";
mountpoint = "/var/lib/kcore/volumes1";
};
};
};
};
}
The block under layoutNix must define disko.devices (directly or as in the example above, where attributes merge into the top-level disko device map). Options follow disko and your node’s kcore.disko.* story.
# Optional: YAML points at a sibling .nix file
kind: DiskLayout
metadata:
name: prod-data-pool
spec:
nodeId: kvm-node-192-168-40-105
layoutNixFile: ./fragments/nvme-data1.nix
Each manifest targets one node — heterogeneous fleets get one manifest per node, applied with kctl apply -f ./disk-layouts/ or as a multi-document YAML file.
# Pre-flight (no writes): controller extracts target devices and runs the structural classifier
kctl diff -f day2-disk-layout.yaml
# Create / update the DiskLayout in the controller (reconciler picks it up on the next tick)
kctl apply -f day2-disk-layout.yaml
# List all DiskLayouts with their phase and refusalReason
kctl get disk-layouts
# Filter by node
kctl get disk-layouts --node kvm-node-192-168-40-105
# Full body + status
kctl describe disk-layout prod-data-pool
# Remove from the controller (does NOT touch the node — the persisted layout stays in place)
kctl delete disk-layout prod-data-pool
With YAML selected above, you still run the same kctl diff / kctl apply / kctl get disk-layouts commands from a shell; switch to CLI in the strip at the top of the page to see them in context.
The status block carries the lifecycle:
| Phase | Meaning |
|---|---|
pending | Created or updated; the reconciler has not yet dispatched it. |
applied | Node-agent applied the layout, persisted it, and ran nixos-rebuild test + switch. |
refused | Classifier rejected it. refusalReason tells you which guard fired. The reconciler will retry the same generation on every tick. |
failed | Classifier accepted it but disko or nixos-rebuild errored. Re-check describe for the message; resubmitting bumps the generation only if the body changed. |
Changing spec.diskLayout, spec.layoutNix, or the file behind layoutNixFile bumps the generation when the resolved Nix body changes; resubmitting identical content does not.
What the node-agent does on a successful apply
- Snapshots
lsblk -J -p -o NAME,PATH,FSTYPE,MOUNTPOINTS,PKNAME,TYPEand runs the safe/dangerous classifier. - Stages the layout under
/etc/kcore/disk/and runsdisko --mode format,mountwith a boundedtimeout. - Atomically promotes the staged file to
/etc/kcore/disk/current.nixon success — this is the path that the shippedmodules/kcore-disko.niximports, so subsequent NixOS evaluations see the realised layout. - Chains
nixos-rebuild testfollowed bynixos-rebuild switchvia a transientkcore-nix-rebuild.servicesystemd unit. Passrebuild = falsein the RPC (or--no-rebuildonkctl node apply-disk) only for validation flows.
There is no separate manual kctl node apply-nix step in the day-2 disk runbook any more.
One-off per-node push (kctl node apply-disk)
Use the direct push when you want to validate a layout without going through the controller, when the node is not yet a registered DiskLayout target, or for local install/repair flows.
# Validation only (default — no --apply, no writes to disks)
kctl --node 10.0.0.5:9091 node apply-disk -f day2-disk.nix
# Apply with a bounded timeout; controller-managed mode required
kctl --node 10.0.0.5:9091 node apply-disk \
-f day2-disk.nix \
--apply \
--timeout-seconds 600
# Apply but skip the nixos-rebuild chain (e.g. for tests)
kctl --node 10.0.0.5:9091 node apply-disk \
-f day2-disk.nix \
--apply \
--no-rebuild
Default --timeout-seconds is 300; the server-side hard cap is 3600. Formatting is destructive — the classifier will refuse layouts that target active devices, but always validate first.
The legacy command kctl node apply-disko still works as a deprecation alias for one release; new tooling and runbooks should use apply-disk.
Inventory and health
Inspect the live disk topology on a node (runs lsblk remotely):
kctl --node 10.0.0.5:9091 node disks
Inspect storage backend, LVM/ZFS inventory, and the most recent DiskLayout phase from the controller:
kctl describe node node-ab12cd34
Operational runbook
Adding a data disk
- Attach the new disk to the node (physically or via the hypervisor).
- Edit or create a
DiskLayoutYAML manifest: preferspec.diskLayoutfor a structured description, or usespec.layoutNix/spec.layoutNixFilefor full disko Nix. Existing OS disk definitions stay on the node; you are adding a new data disk entry (see the recommended workflow example). - Pre-flight:
kctl diff -f day2-disk-layout.yaml. Confirm the listed target devices match what you intend to format. - Apply:
kctl apply -f day2-disk-layout.yaml. - Watch
kctl describe disk-layout <name>untilphase: applied. Forfilesystembackends the new disk mounts at/var/lib/kcore/volumes1,volumes2, …
Reacting to a refusal
kctl describe disk-layout <name>— readstatus.refusalReason.- Quiesce the workloads using the offending device (stop VMs, evacuate volumes).
- The reconciler retries automatically on the next tick. No need to resubmit unless you actually want to change the layout body.
General guidance
- Always run
kctl diff -fbeforekctl apply -f. - Keep the YAML manifests in version control alongside the rest of your cluster manifests.
- One
DiskLayoutper node; for fleet-wide layouts, ship N manifests (or a multi-document YAML). - Schedule operations affecting disks that currently host workloads inside maintenance windows; the classifier will block them outside, by design.
Out of scope
The following are not supported by day-2 tooling and require dedicated procedures:
- In-place OS-disk repartitioning after install.
- Automatic cross-backend migration (filesystem ↔ LVM ↔ ZFS) without manual data movement.
nodeSelector/label-driven fleet rollout — disk layouts are deliberately per-node.--forceoverrides for refusals — the operator clears blockers and resubmits.- Automatic rollback to a previous generation — submit the older layout body explicitly to roll back.