FFmpeg icon indicating copy to clipboard operation
FFmpeg copied to clipboard

vvc_deblock.asm: chroma vertical implementation

Open stone-d-chen opened this issue 1 year ago • 2 comments

Some of the ways I wrote the horizontal asm aren't compatible with vertical. Strong calculations currently stores certain calculations to free up registers for later use. This happens in the middle of the computation. This is a problem since vertical needs to transpose the entire set of registers before storing.

Begin moving register stores to m0,..., m7 earlier. e.g. movu m3, m12 free m12 for use. This will prevent the need to clobber m0.

stone-d-chen avatar Jul 27 '24 14:07 stone-d-chen

Hi @nuomi2021, vertical is almost done, just need to fix some issues with 8 bit

vvc_v_loop_filter_chroma_10_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 80.2
vvc_v_loop_filter_chroma_10_mix_shift_c: 140.0
vvc_v_loop_filter_chroma_10_mix_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 60.0
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_strong_shift_c: 150.0
vvc_v_loop_filter_chroma_10_strong_shift_avx: 80.0
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 90.0
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_10_weak_shift_c: 100.0
vvc_v_loop_filter_chroma_10_weak_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_mix_shift_c: 130.2
vvc_v_loop_filter_chroma_12_mix_shift_avx: 60.0
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 130.2
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_one-side_shift_c: 150.2
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 50.2
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 120.2
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_strong_shift_c: 150.2
vvc_v_loop_filter_chroma_12_strong_shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 90.2
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 60.2
vvc_v_loop_filter_chroma_12_weak_shift_c: 100.2
vvc_v_loop_filter_chroma_12_weak_shift_avx: 60.2

stone-d-chen avatar Aug 10 '24 14:08 stone-d-chen

Hi, @nuomi2021 should be done now!

vvc_v_loop_filter_chroma_8_mix_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_mix_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_8_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_8_mix_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_one-side_shift_c: 373.8
vvc_v_loop_filter_chroma_8_one-side_shift_avx: 53.8
vvc_v_loop_filter_chroma_8_strong_no-shift_c: 223.8
vvc_v_loop_filter_chroma_8_strong_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_strong_shift_c: 333.8
vvc_v_loop_filter_chroma_8_strong_shift_avx: 63.6
vvc_v_loop_filter_chroma_8_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_8_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_8_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_8_weak_shift_avx: 63.6
vvc_v_loop_filter_chroma_10_mix_no-shift_c: 143.8
vvc_v_loop_filter_chroma_10_mix_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_10_mix_shift_c: 203.8
vvc_v_loop_filter_chroma_10_mix_shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_one-side_no-shift_avx: 73.6
vvc_v_loop_filter_chroma_10_one-side_shift_c: 163.8
vvc_v_loop_filter_chroma_10_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_10_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_10_strong_no-shift_avx: 93.8
vvc_v_loop_filter_chroma_10_strong_shift_c: 163.8
vvc_v_loop_filter_chroma_10_strong_shift_avx: 83.8
vvc_v_loop_filter_chroma_10_weak_no-shift_c: 103.8
vvc_v_loop_filter_chroma_10_weak_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_10_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_10_weak_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_mix_no-shift_c: 103.8
vvc_v_loop_filter_chroma_12_mix_no-shift_avx: 83.8
vvc_v_loop_filter_chroma_12_mix_shift_c: 143.8
vvc_v_loop_filter_chroma_12_mix_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_no-shift_c: 143.6
vvc_v_loop_filter_chroma_12_one-side_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_one-side_shift_c: 173.8
vvc_v_loop_filter_chroma_12_one-side_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_strong_no-shift_c: 133.8
vvc_v_loop_filter_chroma_12_strong_no-shift_avx: 73.8
vvc_v_loop_filter_chroma_12_strong_shift_c: 173.6
vvc_v_loop_filter_chroma_12_strong_shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_no-shift_c: 93.8
vvc_v_loop_filter_chroma_12_weak_no-shift_avx: 63.8
vvc_v_loop_filter_chroma_12_weak_shift_c: 113.8
vvc_v_loop_filter_chroma_12_weak_shift_avx: 63.8

stone-d-chen avatar Aug 10 '24 17:08 stone-d-chen

Hi @nuomi2021, should I switch to Luma now? versus submitting chroma to the mailing list

stone-d-chen avatar Aug 20 '24 15:08 stone-d-chen

Hi @stone-d-chen , We need to find a way to share code with hevc for chrome. It's better to send the patch with the luma.

I will fully focus on this and collaborate with you in the following weeks

nuomi2021 avatar Aug 21 '24 12:08 nuomi2021