nomad option to manage volume permissions

Hello,

In Kubernetes/Openshift, when you mount a volume (through CSI or otherwise), you can configure its security context, and there determine, among other things, what Linux user and group the volume will be mounted as:

By default, Kubernetes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.

Is there anything similar in Nomad? As far as I can tell, the only alternative is something like running the following pretask to ensure the configuration is what you want:

    task "prep-disk" {
      driver = "docker"
      volume_mount {
        volume      = "nexus-volume"
        destination = "/nexus-data/"
        read_only   = false
      }
      config {
        image        = "busybox:latest"
        command      = "sh"
        args         = ["-c", "chown -R 200:200 /nexus-data/"]
      }
      resources {
        cpu    = 200
        memory = 128
      }

      lifecycle {
        hook    = "prestart"
        sidecar = false
      }
    }

If this is the recommended way, is there any way it could be added as an example in the documentation? It would appear to me to be a relatively common use case. Thanks!

Sep 16 '20 08:09 josemaia

Hi @josemaia! I noticed this same problem with Sonatype Nexus container in the other issue we're working on with you. What you have here is unfortunately the only way to do this right now. I'm not entirely convinced that K8s is really doing the right thing here in allowing the job operator to recursively change ownership on the volume by default, but we'd need to look into it a bit. I'm going to mark this as an enhancement for future storage work.

Sep 16 '20 12:09 tgross

Just hit this wall myself. It would be really nice if this would be documented somewhere.

Jun 18 '21 15:06 m1keil

Most stateful workloads whose docker image run as non-root user would hit this issue. For example:

prometheus
grafana
loki
elasticsearch

Jun 11 '22 08:06 zhiguangwang

I'm not entirely convinced that K8s is really doing the right thing here in allowing the job operator to recursively change ownership on the volume by default, but we'd need to look into it a bit.

Me neither, that seems to be able to cause more problems than it is supposed to fix. Also recursively changing might add quite a bit of time to the pre-start operations (especially for large volumes). The better option (where possible) is imo to supply this information to the CSI plugin, like one can do with my nfs plugin for this exact reason: https://gitlab.com/rocketduck/csi-plugin-nfs/-/blob/main/nomad/example.volume -- this way it will create the volume with the proper modes and uid/gid.

Jul 16 '22 11:07 apollo13

The better option (where possible) is imo to supply this information to the CSI plugin, like one can do with my nfs plugin for this exact reason: https://gitlab.com/rocketduck/csi-plugin-nfs/-/blob/main/nomad/example.volume -- this way it will create the volume with the proper modes and uid/gid.

Agreed.

The other thing that comes up with this feature request, which has been on my mind of late, is user namespace remapping. Who "owns" the uid/gid here? The plugin is what does the mounting and any chown, but the plugin has either the host's uid/gid (if configured as we currently require) or its own uid/gid map (if configured with userns remapping itself), neither of which are the uid/gid map in the task that mounts the volume, much less some other task in another job entirely!

Jul 18 '22 12:07 tgross

True, namespace remapping is becoming more & more common and there is no easy solution to that. Sure recursive chmod/chown is an option, but if at all it should be optional since doing that by default can often be unnecessary or simply be wrong. And it still leaves the question about who should be doing the chown, because as you said different tasks in a group might have different uids etc…

Supplying a uid/gid during volume creation is not perfect either (I guess most CSI plugins don't support it, but then again many CSI plugins simply do not work easily with nomad either ;)).

Jul 18 '22 13:07 apollo13

More information on what k8s does: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods as well as their support for pushing that down into the CSI layer: https://kubernetes-csi.github.io/docs/support-fsgroup.html

Jul 20 '22 13:07 apollo13

I'm not entirely convinced that K8s is really doing the right thing here in allowing the job operator to recursively change ownership on the volume by default, but we'd need to look into it a bit. I'm going to mark this as an enhancement for future storage work.

I agree, this behavior doesn't seem entirely correct. While at the same time, as a task author it is highly desirable to not have to think about how uid/gids map from the host that bootstrapped the volume's permissions, to the container/system needing access.

My very rough idea would be to have an ACL that controls whether tasks claiming a volume can designate the owning uid/guid of the files contained within. This may open up a cross-platform can of worms not worth opening.

And there are performance considerations as well. Recursive chowning is not particularly efficient.

Dec 08 '22 23:12 acaloiaro

Any progress with this? Currently it makes the ceph-csi driver impossible to use with MS Sql Server as it runs as user mssql:10001 with no permissions to the csi folder.

Edit: Got it to work with @josemaia task prep work around, thanks!

Mar 07 '23 03:03 Lindsay-Mathieson

Hopefully, this will be fixed or documented somewhere properly.

Nov 10 '23 19:11 116davinder

@116davinder this is a feature request. That a feature doesn't exist isn't something to document (an infinite number non-existing features aren't documented either). In any event, the problem to solve is non-trivial and isn't currently on the roadmap. The issue will be updated if we decide to work on it.

Nov 10 '23 19:11 tgross

@tgross, if users like me have to implement these hacky solutions then it is a problem for me to consider using nomad. I completely, understand that it is non-trival solution as you mentioned but not selected within 2-3 years of development, doesn't seem right to me at least.

Nov 12 '23 16:11 116davinder

I just ran into this as well. It would be really great if there was an officially supported way to do this. It's probably the only downside I have right now to using Nomad instead of k8s. And I really would prefer to use Nomad.

Nov 29 '23 22:11 zaphar

Hello from 2024 :) I just hit this too!

Feb 09 '24 08:02 kaspergrubbe

same here

Feb 23 '24 15:02 zip-fa

Folks, please just do an emoji reaction on the issue description (ex. :+1:) if you'd just like to +1 the feature without additional context. That way you're not sending a notification to a few dozen people.

Feb 23 '24 15:02 tgross

I slammed right into this earlier on my personal stack while moving a few things over to the CSI with GCP's PD -- the workaround above is a kind of "get the job done" fix, but doesn't feel great and I was a bit surprised other folks using CSI drivers haven't encountered this especially when taking off the shelf well designed containers that aren't just doing UID=0 GID=0.

In looking into the k8s implementation, here's what I understand about how it works:

There's now support for delegating this to the CSI drivers themselves, if they expose the capability: https://github.com/kubernetes/kubernetes/blob/8d450ef773127374148abad4daaf28dac6cb2625/pkg/volume/csi/csi_mounter.go#L257 -- you of course still need the metadata to be passed in
It'll fallback to the old way, otherwise: https://github.com/kubernetes/kubernetes/blob/8d450ef773127374148abad4daaf28dac6cb2625/pkg/volume/csi/csi_mounter.go#L333 (the recursive chmod we all love)

While I know it's not desirable, when you do dig into how it works (https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/volume_linux.go#L149), I think it's rather pragmatic -- there's a toggle where the recursive chmod only applies if the ownership of the root directory does not match: that is, generally for most it would only apply the first time you mount a volume, and it would generally be very quick, because unless you're using a snapshot as an volume or you've got existing data on your volume where the owning UID/GID of the root folder mounted does not match.. it's going to just change one or two things and get on with it.

I'm not really aware of any other way you would fix this in providers that essentially mount block devices onto the host.

@tgross I'm probably willing to sink a bit of time into having some kind of go at a first pass over implementing something akin to the Kubernetes approach, which it sounds like there may be some hesitancy to accept. If you have a spare moment or two in the next couple of weeks, would you + your team make a consideration for this being a path forward? My take is it'll greatly improve the UX.

The implementation I think I'd target given the above is a less configurable, strict implementation based on the ownership at the volume root, possibly configured in the volume_mount stanza:

volume_mount {
  volume "example"
  destination "/mnt"
  ownership {
    group = 4000
  }
  # or just `ownership_group = 4000`
}

The presence of this would perform the initial check against the mount point, and if there is a mismatch it would proceed to recursively adjust the group + file permissions (file permissions to ensure g+rw on writeable volumes, g+r on read-only)

How do we feel about something like this?

Nov 15 '25 06:11 chrisboulton

Bringing some context into this issue on what the spec says about this flag:

    // If SP has VOLUME_MOUNT_GROUP node capability and CO provides
    // this field then SP MUST ensure that the volume_mount_group
    // parameter is passed as the group identifier to the underlying
    // operating system mount system call, with the understanding
    // that the set of available mount call parameters and/or
    // mount implementations may vary across operating systems.
    // Additionally, new file and/or directory entries written to
    // the underlying filesystem SHOULD be permission-labeled in such a
    // manner, unless otherwise modified by a workload, that they are
    // both readable and writable by said mount group identifier.
    // This is an OPTIONAL field.

mount(2) doesn't have such an identifier, so I guess that's expected as a -o option and filesystem dependent? Or is this the mount(8) group option (i.e. for the CLI command) which doesn't set permissions recursively in the way that the recursive chmod fallback would and so that's all being implemented in the plugin anyways?

That seems roughly reasonable, especially if we can punt to the plugins in common cases (we'd need to update our CSI client for that). But there's some open questions here I think:

How does this interact with user namespace remapping?
How does this interact with access_mode where multiple allocations can mount the same volume on the same host, potentially with different groups (which gets even weirder with the user namespace remapping).
In the recursive chmod case, how do we need to handle concurrency with the plugin RPCs? Right now we have per-volume serialization of batches of operations; does the recursive chmod have to happen inside that critical section?

Nov 17 '25 14:11 tgross