gvisor
gvisor copied to clipboard
nvproxy: Support GPU capability segmentation
Description
Currently, gVisor's NVIDIA GPU support feature (nvproxy
) only supports CUDA-related commands (ioctl
s, allocation classes, etc.). There have been multiple requests to expand this set to support non-CUDA GPU workloads, such as video transcoding (NVENC, NVDEC) in #9452. Vulkan has also come up.
One aspect of nvproxy
's design is that it inherently limits the exposed NVIDIA kernel driver ABI to the set of commands that nvproxy
understands. Like all attack-surface-reduction measures, doing so offers some security benefits.
If we continue to add commands to nvproxy
under the same big bag of commands it currently knows about, this will weaken this benefit over time. This has been fine so far because the only workloads nvproxy
has aimed to support were all of the same type (compute/CUDA-type workloads), and thus can be reasonably expected to require a largely-overlapping set of commands as each other. However, by adding support for e.g. video transcoding workloads, adding them to this existing set would expose video-transcoding ABI commands to CUDA workloads that do not need them. This feature request is about avoiding that.
Is this feature related to a specific bug?
#9452 and other discussions.
Do you have a specific solution in mind?
This feature request is about implementing a capability segmentation scheme to nvproxy
commands. This way, all commands that are not required by CUDA workloads would not be exposed unless explicitly requested.
NVIDIA has the concept of "driver capabilities", which map to shared libraries (.so
files) that roughly correspond to the set of high-level functions that users of each capability would need. They are:
- Compute: Hardware-accelerated number-crunching. CUDA and OpenCL applications
- Graphics: Hardware-accelerated 3D and 2D rendering. OpenGL and Vulkan applications.
- Video: Hardware-accelerated video encoding and decoding. NVENC and NVDEC respectively.
- Display: Rendering to physical monitors. Used by X11 and Wayland applications.
-
Utility: GPU hardware info and management. Used by
nvidia-smi
and NVML.
NVIDIA exposes the choice of these GPU capabilities using the NVIDIA_DRIVER_CAPABILITIES
environment variable, similar to the NVIDIA_VISIBLE_DEVICES
environment variable.
We can reuse this scheme, as it is already out there and fairly easy to understand (i.e. easy for users to specify) while still providing significant ability to keep large amounts of the kernel driver ABI unexposed.