audio
audio copied to clipboard
[RFC] Support non-GPU hardware-based video decoding and encoding
🚀 The feature
Support users to obtain the encoding and decoding capabilities of non-GPU devices (may be a out-of-tree device of torch) by using the familiar APIs of torchaudio/torio.io.
Proposed Solution
Firstly, abstract a base class for the device backend, with subclasses for different device backends inheriting from this base class. This class provides device-related parameters and functionalities such as AV_PIX_FMT_CUDA, AV_HWDEVICE_TYPE_CUDA, and D2D copying. Then, we can separate the device-related logic from the device-independent logic. As for out-of-tree devices, allow them to implement their own device backend subclasses within a torchaudio Python extension package. After importing torchaudio, importing this Python extension package will enable it.
import torchaudio
import torchaudio_npu # torchaudio Python extension
Moreover, we can support autoloading of device extension https://github.com/pytorch/pytorch/issues/122468.
Motivation, pitch
I'm working on making use of the video decoding and encoding capabilities of MLU which is a out-of-tree device utilizing PrivateUse1 dispatch key supported by ffmpeg-mlu. I found that the current ffmpeg-related code is tightly coupled with the GPU, and I have to make extensive modifications to the code in https://github.com/pytorch/audio/tree/main/src/libtorio/ffmpeg to get torchaudio/torio.io to run on ffmpeg-mlu.
Alternatives
No response
Additional context
We're happy to collaborate and support this goals. If the community is open to considering this feature, we can further refine the specific implementation plan.
cc @mthrok