audio
audio copied to clipboard
List of feature requests received so far for StreamReader/Writer
Here is the list of feature requests for StreamReader/Writer I have received so far. Feel free to add
- [x] PTS support in StreamWriter #3135 When processing videos/audios, with StreamReader/Writer, the timestamp information (PTS) is lost. We need a way to provide frame-level PTS to Writer.
- [x] Support encoding options in StreamWriter (#3179) bitrate, gop size etc ....
- Audio passthrough When performing batch video processing (such as super resolution), audio can be kept untouched. By allowing StreamReader to return packet data without decoding, and allowing StreamWriter to re-mux the said data, the video processing becomes more efficient.
- Custom YUV to RGB conversion
Currently when using HW acceleration, only YUV outputs are supported. Filters like
scale_cudaandscale_nppare supported via #3183, but they don't provide YUV->RGB conversion either. We can implement a custom CUDA kernel like the example from Nvidia's CUDA example - filter complex support in StreamWriter (#3063)
- Prerequisites:
- [x] https://github.com/pytorch/audio/issues/3159 -> https://github.com/pytorch/audio/pull/3183
- [x] filter support in StreamWriter #3194
- Prerequisites:
- ~Reduce memory usage~ See https://github.com/pytorch/audio/issues/3165
Other ideas
- [ ] Decoder/Encoder caching
Currently, StreamReader/Writer creates decoder/encoder objects for each input files. In large-scale video decoding situation, if the input formats are known to be same, we might be able to reuse decoders/encoders.
- [x] https://github.com/pytorch/audio/issues/3160
- [x] Apply filter function FFmpeg has a lot of filtering functions. Similar to sox_effects, we should be able to apply these filters to Tensors. See stab for https://github.com/pytorch/audio/issues/3161
- [x] Apply codecs function This should be doable on Python layer, but having a function to apply codecs should be handy. We should replace the existing sox-based apply_codec function, and extend it to video/images.
- Packet loss emulations By dropping some packets in encoder/decoder, one can degrade the media. This could be used as a way of augmentation. The following is a PoC from my prototype using Gilbert-Elliott packet loss model
https://user-images.githubusercontent.com/855818/222531930-34d18b15-2471-45fc-99fa-4594ea3e1bea.mp4