gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

nvproxy: Support GPU capability segmentation

Open EtiennePerot opened this issue 5 months ago • 0 comments

Description

Currently, gVisor's NVIDIA GPU support feature (nvproxy) only supports CUDA-related commands (ioctls, allocation classes, etc.). There have been multiple requests to expand this set to support non-CUDA GPU workloads, such as video transcoding (NVENC, NVDEC) in #9452. Vulkan has also come up.

One aspect of nvproxy's design is that it inherently limits the exposed NVIDIA kernel driver ABI to the set of commands that nvproxy understands. Like all attack-surface-reduction measures, doing so offers some security benefits.

If we continue to add commands to nvproxy under the same big bag of commands it currently knows about, this will weaken this benefit over time. This has been fine so far because the only workloads nvproxy has aimed to support were all of the same type (compute/CUDA-type workloads), and thus can be reasonably expected to require a largely-overlapping set of commands as each other. However, by adding support for e.g. video transcoding workloads, adding them to this existing set would expose video-transcoding ABI commands to CUDA workloads that do not need them. This feature request is about avoiding that.

Is this feature related to a specific bug?

#9452 and other discussions.

Do you have a specific solution in mind?

This feature request is about implementing a capability segmentation scheme to nvproxy commands. This way, all commands that are not required by CUDA workloads would not be exposed unless explicitly requested.

NVIDIA has the concept of "driver capabilities", which map to shared libraries (.so files) that roughly correspond to the set of high-level functions that users of each capability would need. They are:

  • Compute: Hardware-accelerated number-crunching. CUDA and OpenCL applications
  • Graphics: Hardware-accelerated 3D and 2D rendering. OpenGL and Vulkan applications.
  • Video: Hardware-accelerated video encoding and decoding. NVENC and NVDEC respectively.
  • Display: Rendering to physical monitors. Used by X11 and Wayland applications.
  • Utility: GPU hardware info and management. Used by nvidia-smi and NVML.

NVIDIA exposes the choice of these GPU capabilities using the NVIDIA_DRIVER_CAPABILITIES environment variable, similar to the NVIDIA_VISIBLE_DEVICES environment variable.

We can reuse this scheme, as it is already out there and fairly easy to understand (i.e. easy for users to specify) while still providing significant ability to keep large amounts of the kernel driver ABI unexposed.

EtiennePerot avatar Sep 04 '24 00:09 EtiennePerot