FFmpeg icon indicating copy to clipboard operation
FFmpeg copied to clipboard

vvc_deblock.c: fix RANDCLIP

Open stone-d-chen opened this issue 11 months ago • 4 comments

Previously RANDCLIP(x, diff) was computing x - diff and then clipping it between (0, max_pixel_val + rnd() % 2 * diff). This means we're not really generating a random value in the range.

Instead compute (x - diff) + rnd() % 2 * diff. This returns a value such that abs(value - x) < diff.

This greatly improves the generation of strong deblocking data.

stone-d-chen avatar Feb 12 '25 20:02 stone-d-chen

@nuomi2021 been looking into further improvements to the luma generation, it seems fairly non-trivial.

One of the main issues is occasionally (d0 << 1) < beta_2 condition fails in the filter template.

Where d0 = abs(p2 - 2 * P1 + P0) + abs(Q2 - 2 * Q1 + Q0)

The current code does actually try to compensate for it, since (d0 << 1) < beta_2 == d0 < (beta_2 >> 1) which is beta_3.

It becomes difficult to solve both constraints while also satisfying (d0 + d1 < beta).

I attempted to put it into a computer algebra solver (wxMaxima) but it's quite messy.

stone-d-chen avatar Feb 21 '25 17:02 stone-d-chen

If it's difficult, perhaps we should approach it as it is. I'll rebase the code to the latest version to fix the fuzz issue. Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

nuomi2021 avatar Feb 22 '25 14:02 nuomi2021

If it's difficult, perhaps we should approach it as it is. I'll rebase the code to the latest version to fix the fuzz issue. Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

AVX2 sounds good, we need to modify the C side to expose multiple blocks right? I'm trying to learn more about video decoding overall some more exposure to the c would be good.

stone-d-chen avatar Feb 22 '25 19:02 stone-d-chen

If it's difficult, perhaps we should approach it as it is. I'll rebase the code to the latest version to fix the fuzz issue. Then, we can work together on two tasks:

  1. Enabling a larger filter for luma—this is the last missing part.
  2. Enabling AVX2—this will further improve performance.

Which one do you prefer?

AVX2 sounds good,

👍

we need to modify the C side to expose multiple blocks right?

Yes, we need to set up parameters for a single line within a CTU. SSE can process 16 bytes at a time, AVX2 can handle 32 bytes, and AVX-512 can manage 64 bytes per operation.

I'm trying to learn more about video decoding overall some more exposure to the c would be good.

You can start from https://www.amazon.com/Coding-Video-Practical-Guide-Beyond/dp/1118711785 :)

nuomi2021 avatar Feb 23 '25 01:02 nuomi2021