nomad icon indicating copy to clipboard operation
nomad copied to clipboard

CSI: validate that single-node mounts aren't used with canaries

Open f3l1x opened this issue 3 years ago • 16 comments

Nomad version

1.3.1

Operating system and Environment details

Debian 11.3

Issue

Hi ✋

We're running Nomad 1.3.1 with 3 nomad masters, 3 nomad clients, consul, traefik and NFS plugin.

We are facing given allocation in pending state forever. It will end by progress_deadline (10m) and failed deployment.

I am not sure if it's related to CSI, we are using NFS (https://gitlab.com/rocketduck/csi-plugin-nfs). But maybe it's not, sometimes it's happing with CSI and sometimes without.

image image

Reproduction steps

Take a look at job file. If I change only metadata version=10 to version=20, it will stuck. Pending until progress_deadline.

If I change port to static port or dynamic port, it does not matter. Sometimes it surprisely works. :-)

Expected Result

Deployment will success.

Actual Result

Deployment failed. Allocation is still pending.

Job file (if appropriate)

job "canary" {
  type        = "service"
  datacenters = ["dc1"]

  meta {
    version = 10
  }

  update {
    canary       = 1
    max_parallel = 1
    health_check = "checks"
    auto_revert  = true
    auto_promote = true
  }

  group "server" {
    count = 1

    network {
      port "http" { to = 3001 }
    }

    volume "canary-data" {
      type            = "csi"
      source          = "canary-data-volume"
      attachment_mode = "file-system"
      access_mode     = "single-node-writer"
    }

    task "echo" {
      driver = "docker"
      config {
        image = "hashicorp/http-echo:latest"
        args  = [
          "-listen", ":${NOMAD_PORT_http}",
          "-text", "Hello world! IP ${NOMAD_IP_http} and PORT ${NOMAD_PORT_http}",
        ]
        ports = ["http"]
      }

      resources {
        cpu    = 128
        memory = 128
      }

      volume_mount {
        volume      = "canary-data"
        destination = "/app/data"
      }

      service {
        port = "http"

        tags = [
          "traefik.enable=true",
          "traefik.http.routers.${NOMAD_JOB_ID}.rule=Host(`canary.domain.tld`)"
        ]

        check {
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

f3l1x avatar Jun 15 '22 13:06 f3l1x