FFmpeg icon indicating copy to clipboard operation
FFmpeg copied to clipboard

optimize put_uni_pixels_N_128x128 AVX2/SSE4 code

Open nuomi2021 opened this issue 2 years ago • 6 comments

see https://github.com/ffvvc/FFmpeg/pull/146#issuecomment-1749907342 we have a similar issue for put_pixels too, see https://github.com/ffvvc/FFmpeg/pull/145#issuecomment-1749894316

nuomi2021 avatar Oct 06 '23 02:10 nuomi2021

How to reproduce it: make checkasm -j && ./tests/checkasm/checkasm --test=vvc_mc --bench

nuomi2021 avatar Oct 06 '23 02:10 nuomi2021

Hi, I have been investigating the performance issue and it seems like memcopy in the C code is moving 128 bytes in single iteration and sse4 code is moving 16 bytes in a single iteration. Can this be the reason of slowness ?

This was the code I saw while debugging.

Memcopy Code

Screenshot 2024-02-04 at 4 55 48 PM

ff_vvc_put_uni_pixels16_8_sse4

Screenshot 2024-02-04 at 4 56 00 PM

rohanjulka19 avatar Feb 04 '24 17:02 rohanjulka19

@rohanjulka19 , sorry for missed your post. Yes, this may be the reason, could you help send 3 patches to the mailing list for this? One for hevc, one for vvc. then you can remove sse 128 using another patch.

also, some 64xX have similar issues, could also help check? thank you

put_luma_uni_pixels_8_64x4_c: 10.1
put_luma_uni_pixels_8_64x4_sse4: 24.6
put_luma_uni_pixels_8_64x4_avx2: 15.1

nuomi2021 avatar Jul 20 '24 02:07 nuomi2021

comment and commit log are important too. It's easy to merge if it's clear to reviewers.

nuomi2021 avatar Jul 20 '24 02:07 nuomi2021