audio
audio copied to clipboard
Integrating Intel GPU Decoder and Encoder Support into Torio/Torchaudio
🚀 The feature
@mthrok, I'd like to propose the integration of Intel GPU decoder and encoder support into Torio/Torchaudio's ffmpeg. This would provide native Intel GPU support for users running PyTorch on Intel GPU systems. Additionally, this could set the groundwork for a more plugin-based system or an extension mechanism that accommodates future vendor GPU implementations.
Motivation, pitch
Recently, my team and I enabled Intel GPU support in Torchvision's VideoReader. We observed a performance boost of approximately 1.8x ~ 2x for inferencing on ResNet50 using Intel GPU. Given the recent discussions about consolidating functionality in Torio, we're keen on contributing this enhancement to Torio instead.
Alternatives
No response
Additional context
Would the Torio/Torchaudio maintainers be open to this contribution? We're eager to collaborate and support the project's goals.
Hi @leopck
Thanks for the suggestion. We are open to take the contribution, but need to understand a bit more of the approach. How are you enabling the Intel GPU decoder? Does it involves bundling a library?
Currently, CUDA decoder/encoder is integrated through the abstraction provided by FFmpeg, so that we can build it with bare minimum FFmpeg and distribute it without re-distributing hardware library. Users who want to use CUDA decoder need to build/install FFmpeg with cuvid integration by themselves. If your approach is similar, I think we are good to go.
How are you enabling the Intel GPU decoder? Does it involves bundling a library?
We are using ffmpeg + VAAPI so there is no bundling of library required. As long as the system installs VAAPI, then it would be sufficient.
Currently, CUDA decoder/encoder is integrated through the abstraction provided by FFmpeg, so that we can build it with bare minimum FFmpeg and distribute it without re-distributing hardware library. Users who want to use CUDA decoder need to build/install FFmpeg with cuvid integration by themselves.
Right now, we are using the default ffmpeg that we pull from Ubuntu APT repo, which ffmpeg version and ffmpeg package that you are using right now?
How are you enabling the Intel GPU decoder? Does it involves bundling a library?
We are using ffmpeg + VAAPI so there is no bundling of library required. As long as the system installs VAAPI, then it would be sufficient.
Sounds good.
Currently, CUDA decoder/encoder is integrated through the abstraction provided by FFmpeg, so that we can build it with bare minimum FFmpeg and distribute it without re-distributing hardware library. Users who want to use CUDA decoder need to build/install FFmpeg with cuvid integration by themselves.
Right now, we are using the default ffmpeg that we pull from Ubuntu APT repo, which ffmpeg version and ffmpeg package that you are using right now?
Right now, torchaudio is compiled against custom FFmpeg binaries to ensure that we only use FFmpeg's public API (those start with av_), which is licensed LGPL. We compile the same source code against v4.4, 5, and 6. So that it will work with different versions of FFmpeg found in the user environment. FFmpeg installed with apt should work just fine.
Can you try if the current code work with VAAPI already? IIUC, using VAAPI decoder is just about selecting the decoder. For example, Passing h264_vaapi to add_video_stream could just work. I looked at FFmpeg C VAAPI examples and these examples required to do a bit more because it does hardware encoding. I got the impression that you do not need to do anything special to just use hardware decoder.
Right now, torchaudio is compiled against custom FFmpeg binaries to ensure that we only use FFmpeg's public API (those start with av_), which is licensed LGPL. We compile the same source code against v4.4, 5, and 6. So that it will work with different versions of FFmpeg found in the user environment. FFmpeg installed with apt should work just fine.
Yes, agreed this would work for us as well.
Can you try if the current code work with VAAPI already? IIUC, using VAAPI decoder is just about selecting the decoder. For example, Passing h264_vaapi to add_video_stream could just work. I looked at FFmpeg C VAAPI examples and these examples required to do a bit more because it does hardware encoding. I got the impression that you do not need to do anything special to just use hardware decoder.
Actually, there are some special optimizations that we are doing on our side that we would like to help to add into this project such as zero copy mechanisms and using some media acceleration capabilities from Intel GPU. You are right that if the intention here is just solely for decoding and encoding using vaapi on ffmpeg, then h264_vaapi is sufficient. But for our case, we would like to add in these optimization for decoding as well as we would like to get higher performance for capabilities like inferencing. We saw around ~80% performance gain from these optimizations.