feat: support NFS storage pools
WIP draft PR for the feature. So far it works, a bit slow on my machine due to the id remapping which should probably be investigated.
- [x] Investigate the ID remapping
- [x] I'm unsure about the different migration options
- [x] Only support
vers=4.2, should be fine? - [x] Error out on missing
source - [ ] Any specific options we should include?
- [x]
Infoarray needs some QA. Not sure I understand all of it. - [ ]
nfscan't supportxattr, do we need to handle this in other places than migrate?
Fixes: https://github.com/lxc/incus/issues/1311
@stgraber if you have any opinions or pointers to the checkboxes feel free to look over and I can investigate a bit.
I suspect some research needs to be done to figure out how we should interact with the uid/gid and squashing behavior of NFS mounts.
Hi! Is this PR stalled? Do you need help?
@bensmrs I think I need a bit of help to make sure that the Info struct is correct and that I'm not missing any details from the migration steps. I was hoping @stgraber had time to point in the correct direction on this part.
Ah, I just left a few comments in the Info function now.
I was hoping @stgraber had time to point in the correct direction on this part.
Well now you’re served :þ
I can help review, fix stuff, and even write tests, don’t hesitate to ping me. I’d actually be happy to see it working pretty soon.
@bensmrs thanks!
I'll probably not touch this in July, and there is a hackercamp and work stuff happening in August on my end that might not allow me a lot of energy to pick this up after hours.
I'd appreciate some pointers on the id remapping incus does and how that should play with the ID squashing NFS does. I think there should be some guidance and testing there to make sure it behaves as expected. It also takes a bunch of time which might not be needed if nfs reassings UID/GID anyway.
If you have time to write tests and stuff I'd be happy to give you access to my fork.
I don't believe that VFS idmap works on top of NFS at this point, so we'd be dealing with traditional shifting where Incus rewrites all the uid/gid as needed.
What you want to ensure is that NFS is mounted the same way on all machines and that no uid/gid squashing is performed, then that should work fine.
Should we call the driver nfs4 to make sure we don't end up in a weird situation down the line where we don't want to support a v5 along with the v4 code? Is that realistic?
I think we should stick to nfs as nfs4 may give the impression that we don't support nfs3 whereas the featureset we really need is perfectly fine for NFS3, if anything some of the bits in NFS4 may be problematic (built-in uid/gid mapping and such).
Exactly what kind of server version and configuration we can handle is probably something that's best addressed through the documentation for the driver.
NFS would be SO nice. Just an interested party!
@bensmrs
I did some changes so we can pass ipv6 source= paths. It's not great but couldn't come up with a more reliable way to split up on ::1:/somepath or similar paths.
What sort of testing do we want on this? Locally creating pools and creating contianers and VMs work. Also attaching additional volumes.
What sort of testing do we want on this? Locally creating pools and creating contianers and VMs work. Also attaching additional volumes.
We’d basically extend the available filesystem drivers in the test matrix, to test everything we’re already testing: https://github.com/lxc/incus/blob/f614b4c4a724845f901c84b482757b5959e0b665/.github/workflows/tests.yml#L137-L144
The tests initialization routines allow you to define how your driver should be initialized in test/backends, although it’s not enough. I can have a quick look at it as soon as I’m done with my other PR.
I started looking at the tests and am a bit confused by the following error:
> incus storage create incustest-KAg nfs source=10.1.0.227:/media
Error: NFS driver requires "nfs.host" to be specified or included in "source": [<remote host>:]<remote path>
Did I do something wrong here?
IMO, if you have an nfs.host option, I don’t think you should look for a host in source. Just make source be the path on nfs.host, or on localhost if no nfs.host is provided.
Did I do something wrong here?
Uhh, so it turns out that the Makefile has been moving things from being built into build/ to running go install. So there is a slight chance I have missed testing an iteration while working on this.
IMO, if you have an nfs.host option, I don’t think you should look for a host in source. Just make source be the path on nfs.host, or on localhost if no nfs.host is provided.
I was thinking mimicking the behavior of truenas.host and source from the truenas driver. Where you can just specify nfs.host globally and pass a path to source. But I haven't nailed the entire behavior there I suspect. I'm a little bit unsure what the most ergonomic way for the options to behave, that also harmonizes with the existing drivers.
Uhh, so it turns out that the Makefile has been moving things from being built into build/ to running go install. So there is a slight chance I have missed testing an iteration while working on this.
No problem, I ended up using nfs.host for driver initialization.
I was thinking mimicking the behavior of truenas.host and source from the truenas driver. Where you can just specify nfs.host globally and pass a path to source. But I haven't nailed the entire behavior there I suspect. I'm a little bit unsure what the most ergonomic way for the options to behave, that also harmonizes with the existing drivers.
Ok I can’t beat this argument :)
I’m getting some permission denied errors in my tests, I’m investigating.
So, maybe I’m missing something when initializing the pool, but I can’t get past permission errors. Container creation and launch work well, but that’s not enough, see https://github.com/bensmrs/incus/actions/runs/18648669885/job/53161380407?pr=4
Am I missing something in my setup? See https://github.com/lxc/incus/blob/5733ce8ba08e118b68dd6900546bc0de8f4722d9/.github/workflows/tests.yml#L394-L395
Hmmm, I've only tried incus exec container bash and manually created files on NFS volumes on both ends. And it worked. Are we sure that /media doesn't have any weird permissions in ubuntu? Does it exist etc?
Container launch works, so regular operations don’t seem to cause any problem. I’ll debug a bit more later, I was just wondering if your setup had anything specific other than the squash options.
Well I’m currently out of ideas. user_is_instance_user fails with a permission error. We can’t list /root from within the container, it appears owned by 0:0, whoami fails because user 0 is not found, and from the host, the mount appears owned by 1000000:1000000. I don’t know where to go from there; maybe I’m missing something obvious happening in the OpenFGA tests.
I'm honestly not sure.
My export mount has this
/srv/nfs/incus 192.168.1.105/32(rw,sync,no_subtree_check,no_root_squash,no_all_squash)
The top directory is owned by a backup user with UID 1001.
$ ls /srv/nfs | grep incus
drwxr-xr-x 1 backup backup 188 Oct 20 21:11 incus
Adding the storage
λ nfs morten/nfs Ɇ » sudo incus storage create test-pool nfs source='192.168.1.8:/srv/nfs/incus'
Storage pool test-pool created
λ nfs morten/nfs Ɇ » sudo incus launch images:archlinux/current/amd64 --storage test-pool
The container works as expected.
[root@moving-jennet ~]# touch lol
[root@moving-jennet ~]# ls -lah
total 0
drwxr-x--- 1 root root 26 Oct 20 19:25 .
drwxr-xr-x 1 root root 178 Oct 20 19:25 ..
drwx------ 1 root root 12 Oct 1 16:11 .gnupg
drwx------ 1 root root 0 Oct 1 16:11 .ssh
-rw-r--r-- 1 root root 0 Oct 20 19:28 lol
[root@moving-jennet ~]# whoami
root
With the actual rootfs on the NAS looks like.
[root@nassen root]# pwd
/srv/nfs/incus/containers/moving-jennet/rootfs/root
[root@nassen root]# ls -lah
total 0
drwxr-x--- 1 1000000 1000000 26 Oct 20 21:25 .
drwxr-xr-x 1 1000000 1000000 178 Oct 20 21:25 ..
drwx------ 1 1000000 1000000 12 Oct 1 18:11 .gnupg
-rw-r--r-- 1 1000000 1000000 0 Oct 20 21:28 lol
drwx------ 1 1000000 1000000 0 Oct 1 18:11 .ssh
It works on my machine too, but the testing environment is a bit more involved unfortunately, and it tests plenty of non-trivial stuff… It gets very tricky very fast :)
It works on my machine too, but the testing environment is a bit more involved unfortunately, and it tests plenty of non-trivial stuff… It gets very tricky very fast :)
Ahh, right! Is there a good way to replicate the test setup?
Several ways! Either on the CI, like I’ve done on my own branch (I’ve done plenty of changes just for debugging purposes, but most of what I did in test/ should be kept), or on your local machine. From my branch, I just have to run INCUS_CONCURRENT=1 INCUS_VERBOSE=1 INCUS_OFFLINE=1 INCUS_TMPFS=1 INCUS_INSPECT=1 INCUS_BACKEND=nfs INCUS_NFS_HOST=127.0.0.1 ./main.sh standalone in the test/ directory (note that it won’t work in your branch).
I have a bit of time to continue looking at the tests, so I think you can just focus on the other tests failing for now :)
Ok I managed to reproduce the bug. The BusyBox image that the Incus test suite uses does indeed not work with the NFS driver… Smells like an idmap problem, but a Trixie image works well.
Minimal (non-)working example:
> root@incus-dev:~/incus# test/deps/import-busybox
Image imported as: bd06ce5e307a730411aad70ee4583a07cf62ad38c466a45a9d1e431a7eccb054
> root@incus-dev:~/incus# incus launch bd06ce5e307a730411aad70ee4583a07cf62ad38c466a45a9d1e431a7eccb054 busybox-test --storage nfs
Launching busybox-test
> root@incus-dev:~/incus# incus exec busybox-test -- ls -al
ls: .: Permission denied
If you can confirm the behavior on your side @Foxboron, then I guess you’ll have to investigate :) Else, we’ll have to spot differences between Arch and Debian environments :)
> root@incus-dev:~/incus# incus exec busybox-test -- ls -al
ls: .: Permission denied
This specific error is because the permissions on /root lacks executable bit. So setting chmod +x /root solves that. So I think that is an actual bug in the image being created.
I'll try and reproduce the error reported earlier.
Okay, I think the error reported above I the same thing.
timeout --foreground 120 /home/runner/go/bin/incus file push /home/runner/work/incus/incus/test/tmp.gTh/tmp oidc-openfga:user-foo/root/tmpfile.txt --verbose
You are pushing to /root without the executable bit.
Except… it works on every other driver.
I don't disagree it's weird, but the permissions are obviously wrong it seems.
diff --git a/test/deps/import-busybox b/test/deps/import-busybox
index 3c72e87cec67..7dab4e42e747 100755
--- a/test/deps/import-busybox
+++ b/test/deps/import-busybox
@@ -236,6 +236,7 @@ class BusyBox(object):
for path in ("dev", "mnt", "proc", "root", "sys", "tmp"):
directory_file = tarfile.TarInfo()
directory_file.type = tarfile.DIRTYPE
+ directory_file.mode = 0o755
if split:
directory_file.name = "%s" % path
target_tarball_rootfs.addfile(directory_file)
or similar is what we want for /root at least.
On ZFS,
> root@incus-dev:~# incus launch bd06ce5e307a busybox-test-2 --storage zfs
Launching busybox-test-2
> root@incus-dev:~# incus exec busybox-test-2 -- ls -al
total 2
drw-r--r-- 2 0 0 2 Jan 1 1970 .
drwxr-xr-x 12 0 0 13 Oct 20 21:27 ..
From the host,
> root@incus-dev:~# ls -ld /var/lib/incus/storage-pools/nfs/containers/busybox-test/rootfs/root
drw-r--r-- 2 1000000 1000000 4096 Jan 1 1970 /var/lib/incus/storage-pools/nfs/containers/busybox-test/rootfs/root
> root@incus-dev:~# ls -ld /var/lib/incus/storage-pools/zfs/containers/busybox-test-2/rootfs/root
drw-r--r-- 2 root root 2 Jan 1 1970 /var/lib/incus/storage-pools/zfs/containers/busybox-test-2/rootfs/root