ceph-csi icon indicating copy to clipboard operation
ceph-csi copied to clipboard

Ceph-CSI connecting to an existing cephfs Fails in Nomad

Open uvensys-kirchen opened this issue 6 months ago • 8 comments

When using this plugin with nomad to use an existing Subvolume in the default ceph namespace /volumes/_nogroup/ the plugin always produces the same error when a job tries to acces the Volume declared via nomad register volume.hcl

Error: GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = missing required field monitors

As the CSI Plugin is working for rbd as well as cephfs volumes created via nomad on the same ceph cluster this might be a bug.

CSI-Controller configuration:

job "ceph-csi-cephfs-plugin-controller" {
    namespace = "system-infrastructure"
    datacenters = ["dc1"]
    priority = 100
    
    update {
        max_parallel     = 1
        min_healthy_time = "10s"
        healthy_deadline = "3m"
        auto_revert      = true
        auto_promote     = false
        canary           = 1
        stagger          = "30s"
    }
    
    group "controller" {
        network {
            port "metrics" {}
        }
        task "ceph-cephfs-controller" {
            template {
                data        = <<EOF
                [{
                    "clusterID": "<ClusterID>",
                    "monitors": [
                        "<MonitorIP1>",
                        "<MonitorIP2>",
                        "<MonitorIP3>",
                        "<MonitorIP4>",
                        "<MonitorIP5>"
                    ]
                }]
                EOF
                destination = "local/config.json"
                change_mode = "restart"
            }
            driver = "docker"
            config {
                image = "quay.io/cephcsi/cephcsi:v3.13.0"
                volumes = [
                    "./local/config.json:/etc/ceph-csi-config/config.json"
                ]
                mounts = [
                    {
                        type     = "tmpfs"
                        target   = "/tmp/csi/keys"
                        readonly = false
                        tmpfs_options = {
                            size = 1000000 # size in bytes
                        }
                    }
                ]
                args = [
                    "--type=cephfs",
                    "--controllerserver=true",
                    "--drivername=cephfs.csi.ceph.com",
                    "--endpoint=unix://csi/csi.sock",
                    "--nodeid=${node.unique.name}",
                    "--instanceid=${node.unique.name}-controller",
                    "--pidlimit=-1",
                    "--logtostderr=true",
                    "--v=5",
                    "--metricsport=$${NOMAD_PORT_metrics}"
                ]
            }
            resources {
                cpu = 50
                memory = 64
                memory_max = 256
            }
            service {
                name = "ceph-csi-cephfs-controller"
                port = "metrics"
                tags = [ "prometheus" ]
            }
            csi_plugin {
                id        = "ceph-csi-cephfs"
                type      = "controller"
                mount_dir = "/csi"
            }
        }
    }
}

CSI-Node configuration:

job "ceph-csi-cephfs-plugin-nodes" {
  namespace = "system-infrastructure"
  datacenters = ["dc1"]
  priority = 100
  type = "system"
  
  update {
    max_parallel     = 1
    min_healthy_time = "10s"
    healthy_deadline = "3m"
    auto_revert      = true
    auto_promote     = false
    canary           = 1
    stagger          = "30s"
  }
  
  group "nodes" {
    network {
      port "metrics" {}
    }
    task "ceph-node" {
      driver = "docker"
      template {
        data        = <<EOF
[{
      "clusterID": "<ClusterID>",
      "monitors": [
          "<MonitorIP1>",
          "<MonitorIP2>",
          "<MonitorIP3>",
          "<MonitorIP4>",
          "<MonitorIP5>"
    ]
}]
EOF
        destination = "local/config.json"
        change_mode = "restart"
      }
      config {
        image = "quay.io/cephcsi/cephcsi:v3.13.0"
        volumes = [
          "./local/config.json:/etc/ceph-csi-config/config.json"
        ]
        mounts = [
          {
            type     = "tmpfs"
            target   = "/tmp/csi/keys"
            readonly = false
            tmpfs_options = {
              size = 1000000 # size in bytes
            }
          }
        ]
        args = [
          "--type=cephfs",
          "--drivername=cephfs.csi.ceph.com",
          "--nodeserver=true",
          "--endpoint=unix://csi/csi.sock",
          "--nodeid=${node.unique.name}",
          "--instanceid=${node.unique.name}-nodes",
          "--pidlimit=-1",
          "--logtostderr=true",
          "--v=5",
          "--metricsport=$${NOMAD_PORT_metrics}"
        ]
        privileged = true
      }
      resources {
        cpu = 50
        memory = 64
        memory_max = 256
      }
      service {
        name = "ceph-csi-cephfs-nodes"
        port = "metrics"
        tags = [ "prometheus" ]
      }
      csi_plugin {
        id        = "ceph-csi-cephfs"
        type      = "node"
        mount_dir = "/csi"
      }
    }
  }
}

Nomad-Volume configuration:

id = "<random Volume ID/Name>"
name = "<random Volume Name>"
namespace = "<namespace>"
type = "csi"
plugin_id = "ceph-csi-cephfs"

external_id = "<Volume ID in ceph>"

capability {
  access_mode     = "multi-node-multi-writer"
  attachment_mode = "file-system"
}

mount_options {
  fs_type     = "ceph"
  mount_flags = ["noatime"]
}

secrets {
  userID  = "<ceph User>" 
  userKey = "<Secret>"
}

parameters {
  clusterID = "<ceph-Cluster ID>"
  staticVolume = "true"
  fsName = "<cephFS Name>"
  rootPath = "/volumes/_nogroup/<SubVolume>"
}

Nomad-Job configuration used for testing:

job "csi-volume-test" {
  datacenters = ["dc1"]
  namespace = "<namespace>"
  type = "batch"

  group "test" {
    task "write-read-volume" {
      driver = "docker"

      config {
        image = "alpine"
        command = "sh"
        args = ["-c", "echo 'Hello from CSI volume!' > /mnt/testvol/hello.txt && cat /mnt/testvol/hello.txt"]
      }

      volume_mount {
        volume      = "testvol"
        destination = "/mnt/testvol"
        read_only   = false
      }

      resources {
        cpu    = 25
        memory = 25
      }
    }

    volume "testvol" {
      type      = "csi"
      read_only = false

      source = "<Volume ID>" 

      attachment_mode = "file-system"  # oder "block" je nach Plugin
      access_mode     = "multi-node-multi-writer"
    }
  }
}

Ceph user rights:

[client.<username>]
	key = <Secret>
	caps mds = "allow rw fsname=<fsName>"
	caps mon = "allow r fsname=<fsName>"
	caps osd = "allow rw tag cephfs data=<fsName>"

Ceph Volume Info:

{
    "mon_addrs": [
        "<MonitorIP1>:6789",
        "<MonitorIP2>:6789",
        "<MonitorIP3>:6789",
        "<MonitorIP4>:6789",
        "<MonitorIP5>:6789"
    ],
    "pending_subvolume_deletions": 0,
    "pools": {
        "data": [
            {
                "avail": 424216067309568,
                "name": "cephfs.<Volume Name>.data",
                "used": 12288
            }
        ],
        "metadata": [
            {
                "avail": 424216067309568,
                "name": "cephfs.<Volume Name>.meta",
                "used": 2698643
            }
        ]
    },
    "used_size": 127
}

Ceph Subvolume info:

{
    "atime": "2025-05-26 06:39:20",
    "bytes_pcent": "0.00",
    "bytes_quota": 1099511627776,
    "bytes_used": 0,
    "created_at": "2025-05-26 06:39:20",
    "ctime": "2025-06-03 13:36:05",
    "data_pool": "cephfs.<data pool name>",
    "features": [
        "snapshot-clone",
        "snapshot-autoprotect",
        "snapshot-retention"
    ],
    "flavor": 2,
    "gid": 0,
    "mode": 16895,
    "mon_addrs": [
        "<MonitorIP1>",
        "<MonitorIP2>",
        "<MonitorIP3>",
        "<MonitorIP4>",
        "<MonitorIP5>"
    ],
    "mtime": "2025-06-03 13:36:05",
    "path": "/volumes/_nogroup/<SubVolumeName>/<ID>",
    "pool_namespace": "",
    "state": "complete",
    "type": "subvolume",
    "uid": 0
}

uvensys-kirchen avatar Jun 11 '25 11:06 uvensys-kirchen

clusterID = "<ceph-Cluster ID>"

[{ "clusterID": "<ClusterID>", "monitors": [ "<MonitorIP1>", "<MonitorIP2>", "<MonitorIP3>", "<MonitorIP4>", "<MonitorIP5>" ] }]

The clusterID specified when creating the volume should match the clusterID and monitor mapping created above.

Madhu-1 avatar Jun 11 '25 12:06 Madhu-1

Yes the clusterID is the same one for all the configurations. They also all have the same Monitors. As stated the plugin is working for another cephfs pool on the same ceph-cluster where nomad is creating the subvolumes in a specified subvolume -group or using another pool for rbd storage. Just using(Nomad job configuration) an already existing subvolume in Nomad seems to be causing problems.

uvensys-kirchen avatar Jun 12 '25 07:06 uvensys-kirchen

GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = missing required field monitors

@uvensys-kirchen okay i that case of not aware of it from CSI logs it looks like a configuration issue.

Madhu-1 avatar Jun 12 '25 09:06 Madhu-1

Can you share some of the logs where the error pops up? The Ceph-CSI containers contain quite a bit of logging, and that can help to point to the area where the monitors are not, or incorrectly configured (or possibly something else).

At a glance, I do not see anything missing, at least compared to the static CephFS volume docs.

nixpanic avatar Jun 12 '25 13:06 nixpanic

603 13:40:22.284107       1 utils.go:266] ID: 6782 Req-ID: <nomad-ceph-volume-ID> GRPC call: /csi.v1.Node/NodeStageVolume
I0603 13:40:22.284218       1 utils.go:267] ID: 6782 Req-ID: <nomad-ceph-volume-ID> GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/<Nomad-Namespace>/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_id":"<nomad-ceph-volume-ID>"}
E0603 13:40:22.284256       1 utils.go:271] ID: 6782 Req-ID: <nomad-ceph-volume-ID> GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = missing required field monitors
I0603 13:40:22.308266       1 utils.go:266] ID: 6783 Req-ID: <nomad-ceph-volume-ID> GRPC call: /csi.v1.Node/NodeUnpublishVolume
I0603 13:40:22.308329       1 utils.go:267] ID: 6783 Req-ID: <nomad-ceph-volume-ID> GRPC request: {"target_path":"/local/csi/per-alloc/20fced5b-42f1-969f-8933-34d72b5ac69f/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer","volume_id":"<nomad-ceph-volume-ID>"}
E0603 13:40:22.308379       1 nodeserver.go:620] ID: 6783 Req-ID: <nomad-ceph-volume-ID> stat failed: stat /local/csi/per-alloc/20fced5b-42f1-969f-8933-34d72b5ac69f/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer: no such file or directory
I0603 13:40:22.308387       1 nodeserver.go:624] ID: 6783 Req-ID: <nomad-ceph-volume-ID> targetPath: /local/csi/per-alloc/20fced5b-42f1-969f-8933-34d72b5ac69f/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer has already been deleted
I0603 13:40:22.308395       1 utils.go:273] ID: 6783 Req-ID: <nomad-ceph-volume-ID> GRPC response: {}
I0603 13:40:22.308683       1 utils.go:266] ID: 6784 Req-ID: <nomad-ceph-volume-ID> GRPC call: /csi.v1.Node/NodeUnstageVolume
I0603 13:40:22.308708       1 utils.go:267] ID: 6784 Req-ID: <nomad-ceph-volume-ID> GRPC request: {"staging_target_path":"/local/csi/staging/<Nomad-Namespace>/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer","volume_id":"<nomad-ceph-volume-ID>"}
I0603 13:40:22.308752       1 utils.go:273] ID: 6784 Req-ID: <nomad-ceph-volume-ID> GRPC response: {}

uvensys-kirchen avatar Jun 12 '25 16:06 uvensys-kirchen

I0603 13:40:22.284218       1 utils.go:267] ID: 6782 Req-ID: <nomad-ceph-volume-ID> GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/<Nomad-Namespace>/<nomad-ceph-volume-ID>/rw-file-system-multi-node-multi-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_id":"<nomad-ceph-volume-ID>"}

The request does not seem to contain the volume_context that contains the cluster-id and other required parameters (see volumeAttributes in this example). You may need to add that to your volume "testvol" section in the Nomad Job.

nixpanic avatar Jun 13 '25 11:06 nixpanic

Please doublecheck with the parameters block defined in Nomad-Volume configuration I posted The same file with different parameters seems to be able to create a a new subvolume with a dedicated subvolumegroup. The only major change there is the rootPath . If there are further attributes that need to be specified here, I am not aware of what these are.

PS.: I also tested just specifying the monitors of the Ceph used in the parameters area of the Nomad-Volume Configuration but got the same problematic results.

uvensys-kirchen avatar Jun 16 '25 18:06 uvensys-kirchen

A NodeStageVolume procedure does not use parameters. Instead, it should get a volume_context, which is missing.

I do not know how to place that in a Nomad job, but the documentation suggests that there is a context as part of the parameters.

nixpanic avatar Jun 17 '25 13:06 nixpanic

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jul 17 '25 21:07 github-actions[bot]

A NodeStageVolume procedure does not use parameters. Instead, it should get a volume_context, which is missing.

I do not know how to place that in a Nomad job, but the documentation suggests that there is a context as part of the parameters.

This might explain my Problems. I have yet to test it. I'll inform you if that is working for me.

uvensys-kirchen avatar Jul 18 '25 06:07 uvensys-kirchen

I have lost many hours trying to debug this myself, the documentation is not very good at all, and nomad being not as popular as kubernetes definitely does not help.

Here was my issue, I had the same controller/node job as your self, and a very similar volume trying to mount an existing cephfs volume (key on existing). First of all, unlike the RBD side of Ceph, the volume parameters need to be provided via the context instead of using parameters as @nixpanic was so kind to point out. Frustratingly, ceph has documentation for deploying RBD in nomad, but nothing like it for CephFS. Nomad has official examples as well, but nothing about CephFS.

After trying the context suggestion, I got this new error E0722 21:02:50.191707 1 utils.go:270] ID: 506 Req-ID: test-cephfs GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = missing required field provisionVolume which was beyond confusing, as no where in the internet can I get a good match on this. If you look deep enough however, you can find the static-pvc docs mentions staticVolume, and looking at previous PRs you can see that provisionVolume was actually removed in favor for staticVolume as part of https://github.com/ceph/ceph-csi/pull/390/files#diff-ac32bb87c315551d410bb3d2be14eefb4f84953c90f4a92e70ccf4657ae9d7c3R298-R301, it would be nice if this was documented better somewhere, or perhaps the error message updated.

Anyways... if you have an already created volume, you need to add the the staticVolume = true and a rootPath = "/your/path along with the monitors as part of your context. Here is an example of a working volume definition, hope it helps someone out there

id        = "test-cephfs"
name      = "test-cephfs"
type      = "csi"
plugin_id = "ceph-csi-cephfs"
capacity_max = "10G"
capacity_min = "1G"

capability {
  access_mode     = "multi-node-multi-writer"
  attachment_mode = "file-system"
}

context {
  clusterID = "c3ae25e7-45c6-4acb-8d45-06c71bcb5c9f"
  fsName    = "cephfs_test_volume"
  monitors = "ip_addr:6789,ip_addr:6789,ip_addr:6789"
  staticVolume = true
  rootPath = "/test"
}

secrets {
  userID  = "user"
  userKey = "user_key
}

caquillo07 avatar Jul 22 '25 21:07 caquillo07

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Aug 22 '25 21:08 github-actions[bot]

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

github-actions[bot] avatar Aug 29 '25 21:08 github-actions[bot]

Closing the issue as the cause was misconfiguration.

iPraveenParihar avatar Sep 02 '25 09:09 iPraveenParihar

@iPraveenParihar the issue still exists as the system should not panic with no error on such scenarios. Would a new issue help?

caquillo07 avatar Sep 02 '25 11:09 caquillo07