nomad icon indicating copy to clipboard operation
nomad copied to clipboard

feasibility check should fail when setting sticky on a static host volume

Open inahga opened this issue 3 months ago • 1 comments

Nomad version

Output from nomad version

Nomad v1.10.4
BuildDate 2025-08-12T20:48:32Z
Revision 62b195aaa535b2159d215eaf89e6f4a455d6f686

Operating system and Environment details

Ubuntu 24.04 on QEMU/KVM.

Issue

When using a static host volume, you can set sticky = true.

This contradicts the documentation.

You may only use the sticky field for dynamic host volumes.

And attempting to do this fails with errors only revealed in the system logs.

Nov 24 20:59:30 nomad.example.com nomad[621]: worker: error invoking scheduler: worker_id=d8a9bb31-2e6d-8a16-60b8-36576161efc9 error="failed to process evaluation: rpc error: Task group volume claim insert failed: object missing primary i>
Nov 24 20:59:30 nomad.example.com nomad[621]:     2025-11-24T20:59:30.899Z [ERROR] nomad.fsm: ApplyPlan failed: error="Task group volume claim insert failed: object missing primary index"
Nov 24 20:59:30 nomad.example.com nomad[621]: nomad.fsm: ApplyPlan failed: error="Task group volume claim insert failed: object missing primary index"

Reproduction steps

Create a static host volume in the client config

client {
  # ...
  host_volume "data" {
    path = "/data"
    read_only = false
  }
  # ...
}

Attempt to use it with sticky = true

job "example" {
  type      = "service"
  node_pool = "ifn"

  group "ubuntu" {
    count = 1

    volume "data" {
      type   = "host"
      source = "example"
      sticky = true
    }

    task "ubuntu" {
      driver = "podman"

      config {
        image = "docker.io/library/ubuntu:noble"
        command = "/usr/bin/sleep"
        args = ["infinity"]
      }

      volume_mount {
        volume      = "data"
        destination = "${NOMAD_ALLOC_DIR}/data"
      }
    }
  }
}

Expected Result

Nomad should reject validation of this bad job spec.

Actual Result

Nomad attempts to deploy this faulty job spec.

inahga avatar Nov 24 '25 21:11 inahga

The jobspec doesn't differentiate between dynamic and static host volumes for volumes of type = "host". This was a deliberate design decision to allow job authors to be ignorant of that detail. But as you say, this causes a problem for sticky volumes for sure. Until we've selected a node for feasibility checking in the scheduler, we don't know whether the volume is static or dynamic, so we can't catch this in the job submission validation. But we absolutely can and should catch this during feasibility checking. I'll re-title this issue and get it queued up for fixing.

tgross avatar Dec 01 '25 13:12 tgross