rav1e icon indicating copy to clipboard operation
rav1e copied to clipboard

sad_32x32 and 64x64 AVX2 has poor cache locality

Open shssoichiro opened this issue 1 year ago • 3 comments

This at least applies to the HBD ASM, I have not tested against LBD. Benchmarking is showing a large number of cache read misses. Noting this as a possible area for performance improvement.

shssoichiro avatar Jul 25 '23 17:07 shssoichiro

Could you please add how you determined that so willing people can repeat the exercise? :)

lu-zero avatar Jul 25 '23 18:07 lu-zero

Yes, this was measured using valgrind, specifically in this case valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes --simulate-cache=yes target/release/rav1e -s 2 --no-scene-detection -i 0 -I 0 ~/xiph-media-files/objective-1-fast-10bit/speed_bag_640x360_60f.y4m -o /dev/null --limit 20. valgrind measures cache misses as one of its metrics and this can be viewed in kcachegrind. (The downside is that valgrind is quite a bit slower than perf.)

shssoichiro avatar Jul 25 '23 20:07 shssoichiro

This might not be the SAD itself really but rather the nature of e.g. motion compensation. Is this specific to AVX2?

tdaede avatar Jul 25 '23 20:07 tdaede