SVT-VP9
SVT-VP9 copied to clipboard
Platform detection incorrect for AVX2 support
The current code does not detect platform correctly for AVX2 support and hence uses slower functions on non-Intel platforms.This fix causes AVX2 related functions to be invoked correctly and giving the required 2.5X+ speed-up on AMD and possibly other platforms which support AVX2.
Hi @techvintage Thanks for this PR. Instead of hard code, could you make the change in can_use_intel_avx512()? You can change the name of this function too.
Hi @tianjunwork ,this is not hardcoded but changing asm_type to highest level (=7) to begin with.If AVX2 functionality is not supported,it starts downgrading and uses corresponding slower functions instead.This is opposite approach to setting asm_type=0 (lowest value) and increasing it incrementally and fixes the bug.
Hi @techvintage, I don't get it, if the logic after uint32_t asm_type = 7; works as expected on your platform, asm_type will be reassigned to a new value. How does your patch work?
So can_use_intel_avx512() and can_use_Intel_4thGen_features function do not work correctly on non-Intel Platforms and proceeds with asm_type=7 and picks up the faster functions on AMD platforms correctly.We saw a 2.5X+ gain with this on Matisse with this. Also, I see a FFmpeg patch failing on travis CI,My travis CI is synced to github but looks like cloning ffmpeg git repo timed out.Do we developers have a way of triggering the build again?
I don't have a non-intel platform so don't know how these two functions behave. You mean can_use_intel_avx512() doesn't work but still return 1? Otherwise it runs to "asm_type = 1;". I re-run the travis task. It is passed now.
Yes,that's correct.So if we start with asm_type=7(highest),If can_use_intel_avx512() returns true for Intel platforms,it will be still at 7 else will start lowering asm_type based on support available.
If can_use_intel_avx512() still returns 1 for non-intel platform, why do you need to set asm_type=7 at the beginning? Intel platform works just fine. Your goal is to handle non-intel platform, right? If so, instead of letting non-working can_use_intel_avx512() and can_use_intel_core4th_gen_features() mess up the logic. Can we make these two function more generic?