Storage day-2 operations

After initial install you may need to add data disks, expand storage, or reconfigure layouts. kcore exposes day-2 disk changes as declarative DiskLayout resources owned by the controller, with a per-node kctl node apply-disk escape hatch for one-off pushes and validation.

Naming: the operator surface uses the plain word disk (DiskLayout, kctl … disk-layout, /etc/kcore/disk-management-mode). The underlying partitioning tool is still disko and is referenced as such in logs and the kcore.disko.* NixOS module — only the user-facing names changed.

Safety contract

The controller never touches running VMs. It does not drain, stop, migrate, or reboot workloads. The operator empties the node (manually today; via live migration once that lands) before submitting a DiskLayout that would touch a disk currently in use.

Every apply runs through a two-stage classifier:

Controller pre-flight — fast structural check on the submitted layout (extracts target devices from the Nix body, rejects empty/malformed layouts). Used by kctl diff and on kctl apply.
Node-agent authoritative gate — runs lsblk on the target node and refuses any layout whose target device (or any descendant partition) currently hosts active state. The node-agent always has the final say.

When the node-agent refuses, it surfaces a stable, machine-readable code on status.refusalReason so kctl and dashboards can key UX off it instead of parsing prose:

Refusal code	Meaning
`target_device_has_active_kcore_volume`	A partition under the target device is mounted at `/var/lib/kcore/volumes` (or `/var/lib/kcore/images`) — i.e. it currently backs a VM volume or image cache.
`target_device_has_active_system_mount`	A partition under the target device hosts `/`, `/boot`, `/boot/efi`, `/nix`, or `/nix/store`.
`target_device_is_active_lvm_pv`	The target device is an active LVM physical volume (`fstype = LVM2_member`).
`target_device_is_active_zpool_member`	The target device is a member of an active ZFS pool (`fstype = zfs_member`).
`no_target_devices`	The submitted layout did not declare any `/dev/*` target devices.
`lsblk_probe_failed`	The node-agent could not snapshot live disk state. Fail-closed by design.

There is no --force override. To clear a refusal the operator quiesces the affected workloads and resubmits the same manifest — the reconciler retries on the next tick.

Management modes

The file /etc/kcore/disk-management-mode on each node gates day-2 disk apply. The legacy path /etc/kcore/disko-management-mode is still read as a fallback for one release; node install writes the new path and a compatibility symlink so existing setups keep working.

Mode	Behaviour
installer-only (default)	Validation-only flows are allowed; `--apply` is rejected with a clear error reporting the active mode. This is the default after `node install` so freshly installed nodes cannot be re-partitioned by accident.
controller-managed	The controller's reconciler may dispatch `ApplyDiskLayout` RPCs to the node-agent. The classifier still has to declare each apply safe; promotion does not lower the safety bar.

Promote a node to controller-managed mode explicitly when the runbook and maintenance window are in place:

echo controller-managed | sudo tee /etc/kcore/disk-management-mode

Recommended workflow: declarative `DiskLayout`

Submit a YAML manifest with kind: DiskLayout to the controller. The node-agent still applies disko using a Nix body that defines disko.devices; kctl can build that body for you from structured YAML so you do not hand-author Nix for common data-disk layouts.

Exactly one layout source: set either spec.diskLayout (structured YAML: disks, GPT partitions, partition contents), or inline spec.layoutNix: |, or spec.layoutNixFile: relative/path.nix (path resolved next to the manifest). Mixing more than one is rejected.

Preferred — spec.diskLayout: describes whole disks, GPT partitions, and each partition’s role. Supported partition content.type values include filesystem (with format and mountpoint), lvm_pv (with vg), and zfs (with pool). Optional lvmVolumeGroups / zfsPools lists declare empty stubs when needed. kctl expands this to the same disko.devices Nix the controller stores and the reconciler pushes to the node.

# day2-disk-layout.yaml — structured layout (typical)
kind: DiskLayout
metadata:
  name: prod-data-pool
spec:
  nodeId: kvm-node-192-168-40-105   # controller node id for the target machine
  diskLayout:
    disks:
      - name: data1
        device: /dev/nvme1n1        # adjust to match lsblk on the node
        gpt:
          partitions:
            - name: kcore0
              size: "100%"
              content:
                type: filesystem
                format: ext4
                mountpoint: /var/lib/kcore/volumes1

For LVM or ZFS member partitions, use content: { type: lvm_pv, vg: vg_kcore } or content: { type: zfs, pool: tank0 }, and list empty stubs if required:

  diskLayout:
    lvmVolumeGroups:
      - name: vg_kcore
    zfsPools:
      - name: tank0
    disks: [ ... ]

Field names use camelCase under diskLayout (for example lvmVolumeGroups). Shape and options follow what kctl accepts; see also the YAML manifest reference for a field summary.

Advanced — raw disko Nix: use layoutNix: | when you need disko features not covered by the YAML schema yet, or layoutNixFile: when the fragment is large or shared.

# Inline disko Nix (advanced)
kind: DiskLayout
metadata:
  name: prod-data-pool
spec:
  nodeId: kvm-node-192-168-40-105
  layoutNix: |
    {
      disko.devices.disk.data1 = {
        type = "disk";
        device = "/dev/nvme1n1";
        content = {
          type = "gpt";
          partitions.data = {
            size = "100%";
            content = {
              type = "filesystem";
              format = "ext4";
              mountpoint = "/var/lib/kcore/volumes1";
            };
          };
        };
      };
    }

The block under layoutNix must define disko.devices (directly or as in the example above, where attributes merge into the top-level disko device map). Options follow disko and your node’s kcore.disko.* story.

# Optional: YAML points at a sibling .nix file
kind: DiskLayout
metadata:
  name: prod-data-pool
spec:
  nodeId: kvm-node-192-168-40-105
  layoutNixFile: ./fragments/nvme-data1.nix

Each manifest targets one node — heterogeneous fleets get one manifest per node, applied with kctl apply -f ./disk-layouts/ or as a multi-document YAML file.

# Pre-flight (no writes): controller extracts target devices and runs the structural classifier
kctl diff -f day2-disk-layout.yaml

# Create / update the DiskLayout in the controller (reconciler picks it up on the next tick)
kctl apply -f day2-disk-layout.yaml

# List all DiskLayouts with their phase and refusalReason
kctl get disk-layouts

# Filter by node
kctl get disk-layouts --node kvm-node-192-168-40-105

# Full body + status
kctl describe disk-layout prod-data-pool

# Remove from the controller (does NOT touch the node — the persisted layout stays in place)
kctl delete disk-layout prod-data-pool

With YAML selected above, you still run the same kctl diff / kctl apply / kctl get disk-layouts commands from a shell; switch to CLI in the strip at the top of the page to see them in context.

The status block carries the lifecycle:

Phase	Meaning
`pending`	Created or updated; the reconciler has not yet dispatched it.
`applied`	Node-agent applied the layout, persisted it, and ran `nixos-rebuild test` + `switch`.
`refused`	Classifier rejected it. `refusalReason` tells you which guard fired. The reconciler will retry the same generation on every tick.
`failed`	Classifier accepted it but disko or `nixos-rebuild` errored. Re-check `describe` for the message; resubmitting bumps the generation only if the body changed.

Changing spec.diskLayout, spec.layoutNix, or the file behind layoutNixFile bumps the generation when the resolved Nix body changes; resubmitting identical content does not.

What the node-agent does on a successful apply

Snapshots lsblk -J -p -o NAME,PATH,FSTYPE,MOUNTPOINTS,PKNAME,TYPE and runs the safe/dangerous classifier.
Stages the layout under /etc/kcore/disk/ and runs disko --mode format,mount with a bounded timeout.
Atomically promotes the staged file to /etc/kcore/disk/current.nix on success — this is the path that the shipped modules/kcore-disko.nix imports, so subsequent NixOS evaluations see the realised layout.
Chains nixos-rebuild test followed by nixos-rebuild switch via a transient kcore-nix-rebuild.service systemd unit. Pass rebuild = false in the RPC (or --no-rebuild on kctl node apply-disk) only for validation flows.

There is no separate manual kctl node apply-nix step in the day-2 disk runbook any more.

One-off per-node push (`kctl node apply-disk`)

Use the direct push when you want to validate a layout without going through the controller, when the node is not yet a registered DiskLayout target, or for local install/repair flows.

# Validation only (default — no --apply, no writes to disks)
kctl --node 10.0.0.5:9091 node apply-disk -f day2-disk.nix

# Apply with a bounded timeout; controller-managed mode required
kctl --node 10.0.0.5:9091 node apply-disk \
  -f day2-disk.nix \
  --apply \
  --timeout-seconds 600

# Apply but skip the nixos-rebuild chain (e.g. for tests)
kctl --node 10.0.0.5:9091 node apply-disk \
  -f day2-disk.nix \
  --apply \
  --no-rebuild

Default --timeout-seconds is 300; the server-side hard cap is 3600. Formatting is destructive — the classifier will refuse layouts that target active devices, but always validate first.

The legacy command kctl node apply-disko still works as a deprecation alias for one release; new tooling and runbooks should use apply-disk.

Inventory and health

Inspect the live disk topology on a node (runs lsblk remotely):

kctl --node 10.0.0.5:9091 node disks

Inspect storage backend, LVM/ZFS inventory, and the most recent DiskLayout phase from the controller:

kctl describe node node-ab12cd34

Operational runbook

Adding a data disk

Attach the new disk to the node (physically or via the hypervisor).
Edit or create a DiskLayout YAML manifest: prefer spec.diskLayout for a structured description, or use spec.layoutNix / spec.layoutNixFile for full disko Nix. Existing OS disk definitions stay on the node; you are adding a new data disk entry (see the recommended workflow example).
Pre-flight: kctl diff -f day2-disk-layout.yaml. Confirm the listed target devices match what you intend to format.
Apply: kctl apply -f day2-disk-layout.yaml.
Watch kctl describe disk-layout <name> until phase: applied. For filesystem backends the new disk mounts at /var/lib/kcore/volumes1, volumes2, …

Reacting to a refusal

kctl describe disk-layout <name> — read status.refusalReason.
Quiesce the workloads using the offending device (stop VMs, evacuate volumes).
The reconciler retries automatically on the next tick. No need to resubmit unless you actually want to change the layout body.

General guidance

Always run kctl diff -f before kctl apply -f.
Keep the YAML manifests in version control alongside the rest of your cluster manifests.
One DiskLayout per node; for fleet-wide layouts, ship N manifests (or a multi-document YAML).
Schedule operations affecting disks that currently host workloads inside maintenance windows; the classifier will block them outside, by design.

Out of scope

The following are not supported by day-2 tooling and require dedicated procedures:

In-place OS-disk repartitioning after install.
Automatic cross-backend migration (filesystem ↔ LVM ↔ ZFS) without manual data movement.
nodeSelector/label-driven fleet rollout — disk layouts are deliberately per-node.
--force overrides for refusals — the operator clears blockers and resubmits.
Automatic rollback to a previous generation — submit the older layout body explicitly to roll back.