Av1an icon indicating copy to clipboard operation
Av1an copied to clipboard

Zero-copy scene detection

Open redzic opened this issue 2 years ago • 13 comments

Scene detection, especially at higher resolutions, is currently bottlenecked by many allocations and copying of uncompressed frames. This PR uses the ffmpeg API directly to decode frames and removes copies frame data during scene detection, which allows for large speedups compared to the previous code (which generally seem to increase with resolution and higher bit depth).

Considerations before merging

  • Implement --sc-downscale-height and --sc-pix-format, this now needs to be manually done with the ffmpeg API as well
  • Upstream the changes to rav1e, av-scenechange, and possibly av-metrics-decoders (the code changes from av-metrics-decoders could be included in av1an or av-scenechange instead).

Other possible improvements

  • Decode next few frames in parallel of the scene detection analysis itself, instead of decoding frames as needed and then performing analysis in serial. This seems difficult, so this could possibly be done in a future PR
  • If possible, remove uses of Arc<T> in analyze_next_frame and remove temporary vectors holding Arc<Frame<T>>s. This would need to be done in addition to some work on rav1e to allow creating Plane<T> that are not owned
  • Run callback function (incrementing the progress bar) on another thread

redzic avatar May 06 '22 08:05 redzic

@redzic Amazing work :1st_place_medal: :100:

master-of-zen avatar May 06 '22 08:05 master-of-zen

I haven't seen the av-scenechange code yet but here are some of my thoughts:

Fix standard mode of scene detection. Currently, only the fast mode of scene detection actually works because the function estimate_inter_costs seems to require the raw frame data to be padded to a power of 2.

I also think the ideal solution is to not require the padding. I looked into this recently because I realized there's actually a lot of padding on each frame in rav1e and I wanted to reduce memory usage. I got as far as seeing that the motion estimation requires padding because it looks outside of the visible portion of the video when searching. Removing this requirement would probably speed up motion estimation in rav1e (and by extension estimate_inter_costs) as well. But I don't think it's a small change.

Failing that, I think number 2 is the solution we are stuck with. I don't like 1 because as you mentioned it changes the results to be less accurate. And 3, unfortunately, I don't think is possible, or at the very least would be pretty messy to implement (a lot of raw pointers and indirection).

Upstream the changes to rav1e, av-scenechange, and possibly av-metrics-decoders (the code changes from av-metrics-decoders could be included in av1an or av-scenechange instead).

One of the major challenges here I think will be that rav1e does not currently depend on ffmpeg, and I'm not sure that they want to. It might be necessary to implement something similar in rav1e's y4m implementation but without relying on the ffmpeg API (which is cursed).

Decode next few frames in parallel of the scene detection analysis itself, instead of decoding frames as needed and then performing analysis in serial. This seems difficult, so this could possibly be done in a future PR

I don't think this is as difficult as it seems. But I agree putting it in a future PR makes sense, since it is a separate optimization from the zero-copy changes.

If possible, remove uses of Arc<T> in analyze_next_frame and remove temporary vectors holding Arc<Frame<T>>s. This would need to be done in addition to some work on rav1e to allow creating Plane<T> that are not owned

Not sure how feasible this is... It might be worth investigating in a follow-up PR. But I also think the gains from it would be small, if any.

shssoichiro avatar May 06 '22 17:05 shssoichiro

It seems like estimate_inter_costs actually needed more vertical padding (not horizontal padding which is what I thought originally), which is pretty easy to add without having to do a bunch of extra copying. I've added it and now the standard scene detect works (and is also much faster) on many videos I've tried, but it seems to segfault (apparently in a call to copy_nonoverlapping) on HBD videos with standard scene detect, and apparently an assertion fails sometimes too according to CI. I'll look into this further.

redzic avatar May 18 '22 02:05 redzic

I've fixed the standard scene detection mode, now I just need to address the other issues (mainly vapoursynth input support and downscaling for scene detection).

redzic avatar May 18 '22 20:05 redzic

Incredible.

BlueSwordM avatar May 18 '22 20:05 BlueSwordM

Surprisingly ffmpeg isn't actually the problem here, it's mostly vapoursynth which has a limited (at least, for our specific use case) decoding API. Vapoursynth seems to be based on the idea of returning "frame refs", which is counter to the idea of decoding into a specific buffer that av1an allocates with specific dimensions to avoid copying altogether.

I think maybe it's still possible to support zerocopy fast scene detect with vapoursynth input, but in the case of standard scene detect and vapoursynth input I think we are forced to copy into a new buffer. There might be some additional optimizations though, like avoiding copying when combined with --sc-downscale-height, but I'll have to see later what can be done about that.

redzic avatar Jun 03 '22 18:06 redzic

Update: I've added support in this PR for vapoursynth input. It seems to work on the scripts I've tried so far (even with HBD and --sc-method standard), but it might not be 100% correct yet. I'll have to double check why it's even working though to be honest, as I'm legitimately not even sure how it's producing the same results as before when I didn't even add any extra special handling for padding.

redzic avatar Jul 13 '22 09:07 redzic

It looks like the standard scene detection mode with VS only happens to work with certain dimensions (but not with others, for example 930x734), so some kind of check will be needed to add the padding when needed (and unfortunately we have no choice to copy the buffer in this case unless there's a way to configure the padding on the returned frames in VS).

redzic avatar Jul 13 '22 09:07 redzic

unless there's a way to configure the padding on the returned frames in VS

Definitely possible, that's how we did it in AvsP. I have to go back and refresh my memory on exactly how that was done, but vital for colorspace conversion for the UI and such. On the other hand, not sure how useful it would be here, since it would be a wash whether VS or Av1an does the bitblt onto a padded region. Maybe there's a fix for the scene detection? I'll take a look at it.

silverbacknet avatar Jul 13 '22 11:07 silverbacknet

Definitely possible, that's how we did it in AvsP. I have to go back and refresh my memory on exactly how that was done, but vital for colorspace conversion for the UI and such.

Definitely let me know if you figure it out/remember, it would simplify a lot of code if it's possible

redzic avatar Aug 30 '22 19:08 redzic