audio List of feature requests received so far for StreamReader/Writer

List of feature requests received so far for StreamReader/Writer

Open mthrok opened this issue 2 years ago • 0 comments

Here is the list of feature requests for StreamReader/Writer I have received so far. Feel free to add

[x] PTS support in StreamWriter #3135 When processing videos/audios, with StreamReader/Writer, the timestamp information (PTS) is lost. We need a way to provide frame-level PTS to Writer.
[x] Support encoding options in StreamWriter (#3179) bitrate, gop size etc ....
Audio passthrough When performing batch video processing (such as super resolution), audio can be kept untouched. By allowing StreamReader to return packet data without decoding, and allowing StreamWriter to re-mux the said data, the video processing becomes more efficient.
Custom YUV to RGB conversion Currently when using HW acceleration, only YUV outputs are supported. Filters like scale_cuda and scale_npp are supported via #3183, but they don't provide YUV->RGB conversion either. We can implement a custom CUDA kernel like the example from Nvidia's CUDA example
filter complex support in StreamWriter (#3063)
- Prerequisites:
  - [x] https://github.com/pytorch/audio/issues/3159 -> https://github.com/pytorch/audio/pull/3183
  - [x] filter support in StreamWriter #3194
~Reduce memory usage~ See https://github.com/pytorch/audio/issues/3165

Other ideas

[ ] Decoder/Encoder caching Currently, StreamReader/Writer creates decoder/encoder objects for each input files. In large-scale video decoding situation, if the input formats are known to be same, we might be able to reuse decoders/encoders.
- [x] https://github.com/pytorch/audio/issues/3160
[x] Apply filter function FFmpeg has a lot of filtering functions. Similar to sox_effects, we should be able to apply these filters to Tensors. See stab for https://github.com/pytorch/audio/issues/3161
[x] Apply codecs function This should be doable on Python layer, but having a function to apply codecs should be handy. We should replace the existing sox-based apply_codec function, and extend it to video/images.
Packet loss emulations By dropping some packets in encoder/decoder, one can degrade the media. This could be used as a way of augmentation. The following is a PoC from my prototype using Gilbert-Elliott packet loss model

https://user-images.githubusercontent.com/855818/222531930-34d18b15-2471-45fc-99fa-4594ea3e1bea.mp4

Mar 02 '23 19:03 mthrok