gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

ffmpeg with `h264_nvenc` fails to run on gVisor with `-nvproxy`

Open luiscape opened this issue 2 years ago β€’ 14 comments

Description

ffmpeg supports video encoding and decoding using NVIDIA GPUs. Here's an example command:

wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4 && \
    ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

Running that command fails on a container started with -nvproxy -nvproxy-docker with the following ffmpeg error:

...
[AVHWDeviceContext @ 0x55d500277300] cu->cuInit(0) failed -> CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS
Device creation failed: -1313558101.
[h264 @ 0x55d500251900] No device available for decoder: device type cuda needed for codec h264.
...

Suggesting that calling cuInit(0) fails.

The same command succeeds in runc, encoding video correctly.

We pass NVIDIA_DRIVER_CAPABILITIES=all to expose the video capability.

Steps to reproduce

Build OCI image, example:

docker build -t ffmpeg-test -f Dockerfile .
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install wget ffmpeg -y
RUN wget -q -O /neoncat.mp4 https://media.giphy.com/media/sIIhZliB2McAo/giphy.mp4

Then run in system with GPU available.

docker run --rm --runtime=runsc --gpus=all ffmpeg-test ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i /neoncat.mp4 -c:a copy -c:v h264_nvenc -b:v 5M /neoncat_out.mp4

runsc version

runsc version release-20230920.0-21-ge81e0c72a70b
spec: 1.1.0-rc.1

luiscape avatar Oct 04 '23 02:10 luiscape

We don't support graphics/video capabilities yet.

ayushr2 avatar Oct 04 '23 03:10 ayushr2

Sounds good. Thank you for letting me know.

luiscape avatar Oct 04 '23 03:10 luiscape

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Feb 02 '24 00:02 github-actions[bot]

@ayushr2 we may take on the work to add the video capability to NVProxy. Many of our customers are running into this limitation when seeking to do GPU-accelerated ffmpeg stuff. Do you have any thoughts or objections before we do?

thundergolfer avatar Aug 31 '24 16:08 thundergolfer

@thundergolfer We are aligning internally around how to proceed with adding non-CUDA support. Let me get back to you once we have fleshed out the details.

ayushr2 avatar Sep 03 '24 18:09 ayushr2

how to proceed with adding non-CUDA support

It'd be the NVIDIA Video Codec SDK that we'd need to support, right?

Please do keep us in the loop :) We'd slotted in this work for mid-September but will of course adjust if it doesn't fit with your plans.

thundergolfer avatar Sep 03 '24 18:09 thundergolfer

Please see #10856 which needs to happen before non-CUDA ioctls can be added to nvproxy.

EtiennePerot avatar Sep 04 '24 00:09 EtiennePerot

Hi,

As per #10856, nvproxy cannot currently accept patches for nvenc/nvdec commands until it supports NVIDIA capability segmentation. @ayushr2 and others have started to work on this and we expect this to be done (at least structurally done, i.e. the nvproxy ABI definitions will support being tagged by driver capabilities) by early october.

This is a bit later than your planned date for starting this. So in the meantime, as part of this work, it would also be great if you could contribute some NVENC/NVDEC regression tests as well, even if broken in gVisor at PR merge time. This is necessary not just for correctness, but also to ensure long-term maintainability as the NVIDIA driver and userspace libraries change. ffmpeg's h264_nvenc can take care of exercising nvenc, so that should definitely be one such test. Is there something similarly simple we can use for nvdec?

EtiennePerot avatar Sep 07 '24 00:09 EtiennePerot

Thanks for the reply @EtiennePerot. I've made regression testing the first task under our internal project πŸ‘

thundergolfer avatar Sep 09 '24 22:09 thundergolfer

We may be able to reuse gVisor's existing ffmpeg image to avoid creating yet another Dockerfile for this. A regression using it can be as simple as this.

EtiennePerot avatar Sep 09 '24 22:09 EtiennePerot

Are there any plans to support GPU workloads in general such as vulkan? And potentially implement virtio-gpu cross-domain Wayland. We are interested in the aim of mostly replacing crosvm with gvisor.

voidastro4 avatar Sep 21 '24 00:09 voidastro4

Yeah Vulkan support is on the roadmap. No ETA yet.

ayushr2 avatar Sep 21 '24 01:09 ayushr2

Once capability segmentation is in, patches welcome :)

EtiennePerot avatar Sep 21 '24 01:09 EtiennePerot

πŸ‘‹ @EtiennePerot just touching base here. Should we push expectations to early November?

thundergolfer avatar Oct 18 '24 19:10 thundergolfer

Making progress here, but realistically yes.

EtiennePerot avatar Oct 22 '24 16:10 EtiennePerot

Thanks for the update, no worries on our sideπŸ‘Œ

thundergolfer avatar Oct 22 '24 17:10 thundergolfer

An update here: While "early november" turned into "late-november", nvproxy is now ready to accept contributions for non-CUDA ioctls. Please take a look at pkg/sentry/devices/nvproxy/version.go. All of the handler types in driverABI now take in a capability bitfield as second argument. For example, nvgpu.NV_ESC_REGISTER_FD: feHandler(frontendRegisterFD, compUtil), means "for frontend ioctl number nvgpu.NV_ESC_REGISTER_FD, call handler function frontendRegisterFD if at least one capability in compUtil (which is defined as compute,utility) is enabled".

All ioctls are currently tagged as compUtil. This may not be correct, it is just done this way because this matches existing behavior.

There still remains work to be done on the ioctl sniffer tool in order to assist with better capability tagging. But you can already start with the sniffer tool in order to add nvenc/nvdec ioctls now. The workflow is as follows:

  • Run ffmpeg unsandboxed with all capabilities just to make sure it works at all.
  • Re-run ffmpeg unsandboxed. Remove as many capabilities as possible to determine the minimal set of capabilities actually needed by ffmpeg.
  • Run ffmpeg unsandboxed with the ioctl sniffer. Look for anything the sniffer reports as unsupported.
  • Implement these ioctls, tag them with the minimal capability set identified earlier.
  • Verify that ffmpeg works in gVisor. You will need to pass in the set of capabilities to runsc's --nvproxy-allowed-driver-capabilities flag.

This isn't very precise but should unblock the addition of new ioctls for now until the sniffer can be made smarter about capabilities.

EtiennePerot avatar Nov 26 '24 23:11 EtiennePerot

In #11234, I am adding graphics support to nvproxy. You can use it as a reference to add video capability support.

ayushr2 avatar Dec 02 '24 17:12 ayushr2

Closing because issue has been solved for a while now.

luiscape avatar Jun 07 '25 16:06 luiscape