audio
audio copied to clipboard
Add filter and filter_complex to StreamWriter
๐ The feature
Add the capability for ffmpeg filters (-filter, -filter_complex) in StreamWriter and StreamReader according to ffmpeg filters: https://ffmpeg.org/ffmpeg-filters.html
It'll be good to add an argument to set the ffmpeg filters via add_video_stream and add_audio_stream when working with hw acceleration and get entire ffmpeg functionality with acceleration.
Motivation, pitch
Working with ffmpeg filters is used for a lot of basic utilities like adding effects via an overlay, changing background color etc. Using the StreamWriter and then applying the filter losses all the acceleration gains. Combing the ability to add the filters in the StreamWriter will solve this issue and will let the user the ability to create advanced ffmpeg pipelines while benefiting from the Hardware acceleration.
Thanks
Alternatives
No response
Additional context
No response
This is an interesting one.
During the development of StreamReader/StreamWriter, support for the filiter_complex complicated the interface, so I excluded them, thinking that, theoretically one can manually perform pixel-level transformation on PyTorch.
This is technically challenging.
On the low level implementation, the question is how to integrate the filter graph, which is a mapping from multiple AVFrame*s to one AVFrame*.
On the surface level, it is the question about what is a good interface for specifying multiple input tensors. (i.e. what's the good API?)
StreamReader
โโบ AVFrame โโโบ Tensor
source โโค
โโบ AVFrame โโโบ Tensor
StreamWriter
Tensor โโโบ AVFrame โโ
โโโบ destination
Tensor โโโบ AVFrame โโ
FilterComplex
AVFrame โโ โโโโโโโโ
โโโบ โfilterโ โโโบ AVFrame
AVFrame โโ โโโโโโโโ
Let's say we want to achieve the following patter, where we pass two tensors and perform overlay and encode the resulting frame.
StreamWriter
Tensor โโโบ AVFrame โโ โโโโโโโโ
โโโบ โfilterโ โโโบ AVFrame
Tensor โโโบ AVFrame โโ โโโโโโโโ
and we want to do something like
s = StreamWriter(...)
s.add_video_stream(...) # stream0
s.add_video_stream(...) # stream1
s.DEFINE_OVERLAY(stream0, stream1)
What we are missing is that
- a way to tell StreamReader that the calls to
add_video_streamshould not be connected to decoder - a way to tell StreamReader to define a new stream from already defined streams.
Another idea is to have the filtering op as separate class like sox_effects. This is already do-able for simple filters, but the support for complex has to be added on C++ level. Also this approach will incur more data copy than necessary at the boundaries of AVFrame and Tensors.
FilterComplex
Tensor โโโบ AVFrame โโ โโโโโโโโ
โโโบ โfilterโ โโโบ AVFrame โโโบ Tensor
Tensor โโโบ AVFrame โโ โโโโโโโโ
There is also feature request to allow audio pass through from StreamReader to StreamWriter without decoding/encoding, which will avoid unnecessary Tensor/AVFrame conversion, which could be applied here, but that's still in exploration phase.
Another question is overlay and other filters support CUDA frames?
Update: They seem to do ๐ฎ https://github.com/FFmpeg/FFmpeg/blob/aeceefa6220ccb8eac625f78c6fa90d048ccd2de/libavfilter/vf_overlay_cuda.c#L568
If the scope is limited to overlay, it seems the main operation is two-lines here, so it should be easy to achieve the same effect on PyTorch.
https://github.com/FFmpeg/FFmpeg/blob/aeceefa6220ccb8eac625f78c6fa90d048ccd2de/libavfilter/vf_overlay_cuda.cu#L45-L50
Also is any of you help design and implement this? My bandwidth is very limited, so even though I find this interesting work, I don't know if I can work on it.
Hi @mthrok , thanks for your help and fast answer, I really appreciate this. I don't have a budget to work on that for now, and it's not the main focus of our task. But I really hope it will be implemented in the future. I wanted also to know if you know when the nightly changes will be promoted to the stable version in terms of the stream writer. Thanks
Update: We have added filter support to StreamWriter https://github.com/pytorch/audio/pull/3194. Two remaining items to complete this feature are
- Support CUDA filter in StreamWriter
- design multiple input stream
1 should be possible by attaching HWFramesContext to filter graph in StreamWriter. 2 still needs design
Great thanks, in which version it will be release? @mthrok
it'll be in the next release (around 3 months or so). it's available in nightly build right now.