rav1e icon indicating copy to clipboard operation
rav1e copied to clipboard

[Meta] Parity with aomenc

Open shssoichiro opened this issue 3 years ago • 14 comments

I wanted to create a meta issue to track features or changes we can implement to reach quality parity with aomenc. Right now, our speed 6 still trails aomenc's cpu-used 6 by about 30% BD-rate, while also being slower (assuming one tile and no parallel encoding) (AWCY).

We come closer if we up rav1e to s0 (AWCY), where some clips even win over aomenc, but at the cost of rav1e being 2700% slower.

There's also the notable outlier of dark720, which is 200% worse MSSSIM BD Rate even at speed 0.

Here are the ideas so far:

  • [ ] #845

  • [ ] Implement search pruning

    • [ ] Motion estimation
    • [ ] Partition search
    • [ ] Transform size
    • [ ] CDEF selection
  • [ ] Implement wiener filtering

  • [ ] Quantization matrices (#2973)

  • [ ] Delta-Q?

  • [x] #2710

  • [x] #1308

  • [ ] Alt-ref frame denoising?

  • [ ] #1734

  • [ ] #1729

  • [ ] #1722

  • [ ] #1726

  • [ ] #1731

  • [ ] #1730

  • [ ] #1732

  • TODO: Try to fill out this list with more suggestions. I'm not extremely knowledgeable on the aomenc code base, and it's massive, so I'm preferring to welcome discussion from people who may be more knowledgeable.

shssoichiro avatar Jul 09 '21 01:07 shssoichiro

Refs #845

tmatth avatar Jul 09 '21 02:07 tmatth

IIRC, the biggest reason that rav1e is slower than aomenc is because aomenc does a massive amount of search space pruning at the higher speed preset, particularly when it comes to motion estimation.

You can see it in low motion clips vs high motion clips: aomenc and rav1e at speed 6 are similar in high motion clips in terms of speed and visual quality, but once a low motion scenes comes in, aomenc speeds up a lot more than rav1e. rav1e has no search pruning in any manner.

That pruning also applies to block size selection and transform size partitioning, especially with rectangular partitions: at speed 6, they restriction partition selection from 8x8-32x32 transforms.

Another factor is that rav1e's scene-detection and frame type selection is fully done during the encoding process. aomenc does this as well, but not as heavily as it can rely on its default 1st pass to do a lot of the heavy lifting. That why using the no-scene-detection flag with a very fast external scene-detection program or with master-of-zen's work(which should be merged IMO) nicely speeds up the encoder.

And last of all, >CPU-5 disables all loop restoration in aomenc. That alone gives it an absolutely massive speed boost at 50-70% average encoding framerates at a cost to metrics.

Finally, default aomenc parameters in video coding tends to favor artifact prevention over raw detail and psycho-visual optimizations, which means most metrics usually prefer aomenc over rav1e's performance.

All in all, some suggestions:

  1. Starting to implement search pruning: motion estimation, partition search, and transform size.
  2. Higher speed scene-detection via merging of master-of-zen's patch.
  3. Implementing a stronger CDEF implementstion, and perhaps low aggressivity wiener filtering if someone has the time to do it.
  4. Improving luma coding performance is a good goal, but it must not come obviously at hurting rav1e's strengths. For example, while not using grain synthesis and with default parameters, rav1e's low light performance is better than both SVT-AV1 and aomenc. Same thing as with color: it handily trounces anything but CJXL intra coding.

That is all from me, for now.

BlueSwordM avatar Jul 09 '21 04:07 BlueSwordM

I think we'll need some sort of solution to solve the vastly different luma/chroma balance if we want to benchmark ourselves against libaom. I'd rather not just tune the balance to win the benchmark, but rather change our benchmark for this particular case, e.g. run on grayscale, or have a special tune option specifically choosing quantizers similar to libaom.

tdaede avatar Jul 09 '21 13:07 tdaede

From what I can see on the case of dark720, it looks like the source is very noisy and aomenc smooths out the noise more than rav1e, resulting in a significantly smaller file. So in this case, rav1e is producing a file that is closer to the original, but at a much higher filesize, which is not good for BD Rate. Not immediately sure what the solution for that case is.

shssoichiro avatar Jul 12 '21 14:07 shssoichiro

In my tests partition_range had huge impact on speed, so I think fast heuristics for block split will be very helpful.

I suggest being careful with luma/chroma balance, because visual metrics usually handle color badly. I'm not entirely sure, but I think libvmaf ignores color entirely for SSIM. The SSIM algorithm has a luminance component, so it would be absurd if applied to Cb/Cr channels.

If you're going to change color balance, verify with butteraugli at very high bitrates. My DSSIM should be OK too, especially at lower bitrates (it does SSIM without the luminance component when comparing color).

kornelski avatar Jul 12 '21 17:07 kornelski

@BlueSwordM took the words right out my mouth

ghost avatar Jul 12 '21 22:07 ghost

There's also another very important factor to take into account when comparing aomenc and SVT-AV1 against rav1e: unless I missed something while parsing the code, rav1e never voluntarily denoises the input.

aomenc and SVT-AV1 use temporal denoising on the input over some types of frames, with aomenc giving specific control over it with arnr-strength=X, with a range of 0-6, with 5 being the default.

I've yet to do an AWCY run detailing what happens when you disable ARNR denoising entirely, but from my subjective and anecdotal tests, it can have a large impact on quality, speed and metrics, especially in some hard content like video games.

Just a tip.

BlueSwordM avatar Sep 29 '21 18:09 BlueSwordM

So basically, one of the 1st steps we should do to improve quality is implement the full set of CDEF search strengths.

The current method, which is picking CDEF strength from the current quantizer(so CDEF Pick from Q) is good for higher fidelity encoding, but certainly not optimal for keeping clean edges at lower bitrates.

However, the full set of CDEF search strength is a bit problematic for fidelity, as it can result in slight blurring in high frequency AC blocks(hair, skin, grass, noise, etc).

Therefore, my idea would be to separate the CDEF tuning 2 categories:

  1. With the PSNR tune, the full set of CDEF search strengths per speed level will always be available.
  2. With the psychovisual tune, the full set of CDEF search strengths per speed level will still be available, but as you decrease the quantizer/block, a curve could be used to limit what strengths the implemented CDEF algorithm can choose in the 1st place.

Furthermore, since CDEF can actually hurt fidelity when a lot of noise is present, a simple noise estimation algorithm could be used to disable CDEF filtering once enough noise reaches the threshold(also based on quantizer somewhat).

BlueSwordM avatar Mar 17 '22 00:03 BlueSwordM

A couple of items that came up today:

  • Quantization matrices. In aomenc, these provide a substantial compression improvement basically for free.
  • Delta-q coding. Haven't looked in depth much, just curious what we can do with it.

shssoichiro avatar May 20 '22 01:05 shssoichiro

I went through the task list and found several items that are tagged compression performance that seem to referencing tools that aren't implemented yet. This might account for some of the delta, too. It might be valuable to triage these based on their potential.

  • [ ] https://github.com/xiph/rav1e/issues/1734
  • [ ] https://github.com/xiph/rav1e/issues/1729
  • [ ] https://github.com/xiph/rav1e/issues/1722
  • [ ] https://github.com/xiph/rav1e/issues/1726
  • [ ] https://github.com/xiph/rav1e/issues/1731
  • [ ] https://github.com/xiph/rav1e/issues/1730
  • [ ] https://github.com/xiph/rav1e/issues/1732

doctortheemh avatar May 20 '22 01:05 doctortheemh

@shssoichiro Do you think it would be a good idea to pin this issue? It seems pretty important, IMO.

CartoonFan avatar Sep 24 '22 19:09 CartoonFan

Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.

shssoichiro avatar Sep 25 '22 02:09 shssoichiro

Good idea considering this is a meta issue gathering basically "the most important" features we need to add. tbh I didn't even know that pinning issues was a thing in Github, unless it's something they added recently.

Maybe? I've seen it on some other repos, but I can't really say when they started popping up.

Thanks for pinning and replying!

CartoonFan avatar Sep 25 '22 05:09 CartoonFan

Sorry for unpinning 😅 I accidentally clicked the button, I pinned it back

redzic avatar Nov 09 '22 10:11 redzic