audio Add filter and filter_complex to StreamWriter

🚀 The feature

Add the capability for ffmpeg filters (-filter, -filter_complex) in StreamWriter and StreamReader according to ffmpeg filters: https://ffmpeg.org/ffmpeg-filters.html

It'll be good to add an argument to set the ffmpeg filters via add_video_stream and add_audio_stream when working with hw acceleration and get entire ffmpeg functionality with acceleration.

Motivation, pitch

Working with ffmpeg filters is used for a lot of basic utilities like adding effects via an overlay, changing background color etc. Using the StreamWriter and then applying the filter losses all the acceleration gains. Combing the ability to add the filters in the StreamWriter will solve this issue and will let the user the ability to create advanced ffmpeg pipelines while benefiting from the Hardware acceleration.

Thanks

Alternatives

No response

Additional context

No response

Feb 15 '23 07:02 maysteinfeld

This is an interesting one.

During the development of StreamReader/StreamWriter, support for the filiter_complex complicated the interface, so I excluded them, thinking that, theoretically one can manually perform pixel-level transformation on PyTorch.

This is technically challenging.

On the low level implementation, the question is how to integrate the filter graph, which is a mapping from multiple AVFrame*s to one AVFrame*.

On the surface level, it is the question about what is a good interface for specifying multiple input tensors. (i.e. what's the good API?)

StreamReader

        ┌► AVFrame ──► Tensor
source ─┤
        └► AVFrame ──► Tensor

StreamWriter

Tensor ──► AVFrame ─┐
                    ├─► destination
Tensor ──► AVFrame ─┘

FilterComplex

AVFrame ─┐   ┌──────┐
         ├─► │filter│ ──► AVFrame
AVFrame ─┘   └──────┘

Let's say we want to achieve the following patter, where we pass two tensors and perform overlay and encode the resulting frame.

StreamWriter

Tensor ──► AVFrame ─┐   ┌──────┐
                    ├─► │filter│ ──► AVFrame
Tensor ──► AVFrame ─┘   └──────┘

and we want to do something like

s = StreamWriter(...)
s.add_video_stream(...)  # stream0
s.add_video_stream(...)  # stream1
s.DEFINE_OVERLAY(stream0, stream1)

What we are missing is that

a way to tell StreamReader that the calls to add_video_stream should not be connected to decoder
a way to tell StreamReader to define a new stream from already defined streams.

Another idea is to have the filtering op as separate class like sox_effects. This is already do-able for simple filters, but the support for complex has to be added on C++ level. Also this approach will incur more data copy than necessary at the boundaries of AVFrame and Tensors.

FilterComplex

Tensor ──► AVFrame ─┐   ┌──────┐
                    ├─► │filter│ ──► AVFrame ──► Tensor
Tensor ──► AVFrame ─┘   └──────┘

There is also feature request to allow audio pass through from StreamReader to StreamWriter without decoding/encoding, which will avoid unnecessary Tensor/AVFrame conversion, which could be applied here, but that's still in exploration phase.

Feb 16 '23 14:02 mthrok

Another question is overlay and other filters support CUDA frames?

Update: They seem to do 😮 https://github.com/FFmpeg/FFmpeg/blob/aeceefa6220ccb8eac625f78c6fa90d048ccd2de/libavfilter/vf_overlay_cuda.c#L568

If the scope is limited to overlay, it seems the main operation is two-lines here, so it should be easy to achieve the same effect on PyTorch.

https://github.com/FFmpeg/FFmpeg/blob/aeceefa6220ccb8eac625f78c6fa90d048ccd2de/libavfilter/vf_overlay_cuda.cu#L45-L50

Feb 16 '23 14:02 mthrok

Also is any of you help design and implement this? My bandwidth is very limited, so even though I find this interesting work, I don't know if I can work on it.

Feb 16 '23 14:02 mthrok

Hi @mthrok , thanks for your help and fast answer, I really appreciate this. I don't have a budget to work on that for now, and it's not the main focus of our task. But I really hope it will be implemented in the future. I wanted also to know if you know when the nightly changes will be promoted to the stable version in terms of the stream writer. Thanks

Feb 19 '23 08:02 maysteinfeld

Update: We have added filter support to StreamWriter https://github.com/pytorch/audio/pull/3194. Two remaining items to complete this feature are

Support CUDA filter in StreamWriter
design multiple input stream

1 should be possible by attaching HWFramesContext to filter graph in StreamWriter. 2 still needs design

Apr 04 '23 01:04 mthrok

Great thanks, in which version it will be release? @mthrok

Apr 30 '23 06:04 maysteinfeld

it'll be in the next release (around 3 months or so). it's available in nightly build right now.

May 02 '23 19:05 xiaohui-zhang

audio audio copied to clipboard

Add filter and filter_complex to StreamWriter

🚀 The feature

Motivation, pitch

Alternatives

Additional context

audio
audio copied to clipboard