nomad
nomad copied to clipboard
CSI: validate that single-node mounts aren't used with canaries
Nomad version
1.3.1
Operating system and Environment details
Debian 11.3
Issue
Hi ✋
We're running Nomad 1.3.1 with 3 nomad masters, 3 nomad clients, consul, traefik and NFS plugin.
We are facing given allocation in pending state forever. It will end by progress_deadline (10m) and failed deployment.
I am not sure if it's related to CSI, we are using NFS (https://gitlab.com/rocketduck/csi-plugin-nfs). But maybe it's not, sometimes it's happing with CSI and sometimes without.
Reproduction steps
Take a look at job file. If I change only metadata version=10 to version=20, it will stuck. Pending until progress_deadline.
If I change port to static port or dynamic port, it does not matter. Sometimes it surprisely works. :-)
Expected Result
Deployment will success.
Actual Result
Deployment failed. Allocation is still pending.
Job file (if appropriate)
job "canary" {
type = "service"
datacenters = ["dc1"]
meta {
version = 10
}
update {
canary = 1
max_parallel = 1
health_check = "checks"
auto_revert = true
auto_promote = true
}
group "server" {
count = 1
network {
port "http" { to = 3001 }
}
volume "canary-data" {
type = "csi"
source = "canary-data-volume"
attachment_mode = "file-system"
access_mode = "single-node-writer"
}
task "echo" {
driver = "docker"
config {
image = "hashicorp/http-echo:latest"
args = [
"-listen", ":${NOMAD_PORT_http}",
"-text", "Hello world! IP ${NOMAD_IP_http} and PORT ${NOMAD_PORT_http}",
]
ports = ["http"]
}
resources {
cpu = 128
memory = 128
}
volume_mount {
volume = "canary-data"
destination = "/app/data"
}
service {
port = "http"
tags = [
"traefik.enable=true",
"traefik.http.routers.${NOMAD_JOB_ID}.rule=Host(`canary.domain.tld`)"
]
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}