snapd Add nvidia-tegra-drivers-support interface

This interface will give snaps access to the hardware on nvidia tegra platforms. It includes the udev rules contained in the nvidia-tegra-drivers-36 .deb package in the ubuntu-tegra/updates ppa.

The apparmor profiles and seccomp rules were determined iteratively using snappy-debug and testing it with three different snaps that were created. One for the nvidia-tegra-drivers-36 package which also contains the nvidia-smi tool, one for cuda-samples and one for libnvinfer-samples in nvidia's jetson/common package repository which contains TensorRT samples.

The overall structure of the interface as well as the tests (and the name) were mostly inspired by the nvidia-drivers-support interface.

Jul 15 '24 15:07 DocSepp

Hello

Is there any existing design for the new interface? Thanks

Jul 15 '24 15:07 zyga

Hello

Is there any existing design for the new interface? Thanks

Hey, thanks for your comments.

This is the first time I'm writing an interface. Do you have an example of what is expected for this? Then I can work on it.

As for the spec: none exists yet but I can work on creating one too if needed. I guess the spec should also address the comments you made already? I think I'll need to ask nvidia about documentation for some of the things you mentioned.

Jul 16 '24 07:07 DocSepp

@DocSepp Thank you for writing your first interface :)

I think having a spec that goes over the udev part would be very useful, especially for the reason I've mentioned. Usually udev rules are only tagging devices to snaps. Here there are non-trivial actions that have global consequences.

I'm happy to help if you have further questions. If you have a spec - even a draft - please ask me for review.

Finding references for the major:minor pairs and other locations would be great, even if we start by linking to existing rules elsewhere.

Jul 16 '24 09:07 zyga

Hello,

thank you for your other comments. I just came back from PTO and therefore could not continue working on this.

I'd be happy to have a call with you to discuss how this should be done and what some of the best practices are.

Aug 01 '24 11:08 DocSepp

I just updated the interface to create a set of rules that are as minimal as possible to make all my test snaps work in strict confinement.

I was able to greatly reduce the number of udev rules, as we only need to tag a couple of device nodes.

When including the udev rules, the snaps also don't try to access as many paths in @{PROC} so I could remove them from the apparmor profiles.

While testing I discovered that whereas snappy-debug mentions that nvidia-smi is trying to access /sys/devices/platform/bus@0/3810000.fuse/fuse/nvmem, strace shows that it's actually trying to access /sys/bus/nvmem/devices/fuse/nvmem which links to the former. However as far as I can tell, I need to give read access to the location the link points to explicitly. Is there something I'm missing here?

Finally, what about the name of the interface? For now, every snap that accesses the iGPU will need to plug into this interface. But, if I understood correctly during a meeting with @zyga, the "support" naming-scheme is well-defined and does not apply to an interface that many snaps need to plug into.

Looking forward to hearing your feedback.

Sep 17 '24 10:09 DocSepp

As written we cannot plug an interface with non-tagging udev rules to more than one snap/app.

Sep 17 '24 11:09 zyga

To state this clearly: a support interface cannot be plugged to arbitrary snaps. I think you want to distribute those permissions into existing interfaces and move udev initialization rules to a gadget with system files.

It is possible that after this operation the support interface still has a reason to exist, but it must be very well scoped. It might be to allow some Nvidia/Tegra specific services to work, for example as a part of a gadget or kernel snaps and applications defined there.

Sep 17 '24 11:09 zyga

Thanks for your comments. I will set up a call with you.

I think we don't want a support interface then and instead have an interface that a snap can plug into whenever it wants to access the iGPU. Let's discuss this in a call :)

Sep 17 '24 16:09 DocSepp

We discussed a few topics related to this pull request in a meeting today; a few of my takeaways:

"-support" interfaces are only to be used for rare things, perhaps special apps doing special initialisation as part of a gadget snap; perhaps nvidia-smi would be a good example of that
we should try to use the "opengl" snapd interface as a starting point for CUDA/TensorRT/... other GPU runtime things rather than a new one
the corresponding cuda-samples snap should be shared along this interface to demonstrate how CUDA apps would look like
the CUDA samples are doing a number of things that not necessarily all CUDA using apps would do; for instance, dev/shm usage, or hardware-observer related things; the cuda-samples should use existing interfaces or snap packaging approaches rather than add support to new snapd interface; for instance, this approach could be used to avoid shm usage (alternate approach)
udev rules that are just tagging devices to be exposed into snaps should just be added to existing interfaces if needed; other udev rules should go into a custom gadget snap (or possibly into udev / the core snap themselves)
nvidia-smi is a tool to query/manage GPUs; it might need a specific interface, and that's notably useful for cert tests, but that's likely a different scenario than CUDA workloads; it's possible we have to defer the development of an interface for nvidia-smi
there is an existing pattern for the snap store and snapd to use test-snapd-xxx snaps as reference to test key snapd functionality
it's possible we discover new functionality that requires a tegra specific interface, e.g. nvidia-tegra

Sep 19 '24 14:09 lool

Changed milestone to not target any release coming soon @DocSepp, welcome to let me know the urgency/expectation from your side so we can see what we can do.

Sep 19 '24 22:09 ernestl