rav1e icon indicating copy to clipboard operation
rav1e copied to clipboard

Draft on complete next-gen CDEF implementation (early WIP)

Open BlueSwordM opened this issue 1 year ago • 2 comments

Hello again. It has been a while since I've made any relevant PRs to rav1e, so here goes.

Ever since a faithful meeting we had back in 2021, I knew that rav1e didn't have a higher quality full CDEF implementation, so I've always wanted rav1e to get it.

Following this issue made me think of it again today, which is why I started the work on it: https://github.com/xiph/rav1e/issues/2759

My plan to implement the full CDEF implementation will follow multiple steps in its design:

1. Full CDEF strength selection implementation

Adding the full CDEF implementation based on distortion optimization with the full set of filter strengths being available first(0-15 for primary, 0-4 for secondary).

2. Implement curve based speed pruning:

As quality increases(quantizer decrease), the CDEF strength filter search space is made smaller following a power curve function. As speeds increase(s0 all the way to s10), the CDEF strength filter search space is made smaller quicker, and at much higher speeds, gets restricted entirely from the start.

This would mainly be for the psychovisual tune. The PSNR tune would only get static search spaces based on speed features.

3. Implement an entirely different superior CDEF_dist metric(long-term).

The current CDEF_dist metric, while still being considerably better than what aomenc and SVT-AV1 use(MSE as the dist metric), isn't the absolute best that can be currently used. My future plan is to add a subset of ssimulacra2 as a superior slower dist_metric for the psycho-visual tune, especially as it would penalize some classic encoder faults like excessive blurring and detail wiping. Repository can be found here: https://github.com/cloudinary/ssimulacra2

Piping work for that:

  1. Implementing a YUV > XYB and XYB > YUV crate for internal color conversion(could perhaps be used to replace YCbCr internally for rav1e, but that's on a completely different scope...)

  2. Implementing everything ssimulacra2 related in Rust(biggest difficulty).

  3. Adding some SIMD to make it as abominably fast as possible.

4. Making CDEF faster

Simple as that: lower complexity, more SIMD for more architectures, etc.

That'll be all from me today.

BlueSwordM avatar Sep 08 '22 16:09 BlueSwordM

Also resurfacing this idea you brought up:

Furthermore, since CDEF can actually hurt fidelity when a lot of noise is present, a simple noise estimation algorithm could be used to disable CDEF filtering once enough noise reaches the threshold(also based on quantizer somewhat).

which definitely seems achievable if we can find a decent heuristic for "noise in source + quantizer will make CDEF harmful"

tmatth avatar Sep 09 '22 19:09 tmatth

Yes, and you just gave me an idea as well for further speedups and threading benefits.

If we could do it noise estimation at a transparent tile level(no actual tiles being made, just for the analysis), we could get great threading on that end since the noise estimation could easily be run in parallel, and this would also allow for greater CDEF control over the frame level method.

Now, that would require my implementation to go through tiles, so let's leave that for the end :)

BlueSwordM avatar Sep 10 '22 03:09 BlueSwordM