mixxx
mixxx copied to clipboard
Build option for show vectorization info
With -DINFO_VECTORIZE=ON all target compiler will list vectorized loops in the terminal output Unfortunately enabling this rebuilds all files!
I have added this to compare the vectorization capability using the GitHub ci.
In this PR it is disabled, an enabled run can be found here: https://github.com/daschuer/mixxx/actions/runs/2702452852
This is a compasion of the loops in sample.cpp The result of msvc is disaopinting: 13/39 loops vectorized
Gcc 9.4 | msvc | clang | Gcc 12.1 | |
---|---|---|---|---|
124 | optimized | optimized | optimized | optimized |
145 | optimized | missed | optimized | optimized |
153 | optimized | optimized | optimized | missed |
169 | optimized | missed | optimized | optimized |
189 | optimized | missed | missed | optimized |
195 | optimized | missed | missed | optimized |
205 | optimized | missed | missed | optimized |
211 | optimized | missed | missed | optimized |
222 | optimized | optimized | optimized | optimized |
236 | optimized | missed | optimized | optimized |
254 | optimized | missed | optimized | optimized |
261 | optimized | optimized | optimized | optimized |
236 | optimized | optimized | optimized | optimized |
281 | optimized | optimized | optimized | optimized |
304 | optimized | optimized | optimized | optimized |
323 | optimized | optimized | optimized | optimized |
352 | optimized | missed | optimized | optimized |
359 | optimized | optimized | optimized | optimized |
378 | optimized | optimized | optimized | optimized |
391 | optimized | missed | optimized | optimized |
407 | optimized | missed | optimized | optimized |
433 | optimized | missed | optimized | optimized |
444 | optimized | missed | optimized | optimized |
456 | optimized | missed | optimized | optimized |
477 | optimized | missed | missed | optimized |
471 | optimized | missed | missed | optimized |
492 | optimized | missed | missed | optimized |
498 | optimized | missed | missed | optimized |
512 | optimized | missed | optimized | optimized |
522 | optimized | missed | optimized | optimized |
533 | optimized | missed | missed | optimized |
545 | optimized | missed | optimized | optimized |
557 | optimized | missed | optimized | optimized |
571 | optimized | missed | missed | optimized |
585 | optimized | missed | missed | optimized |
594 | missed | missed | missed | optimized |
608 | missed | missed | missed | optimized |
The result of msvc is disaopinting: 13/39 loops vectorized
Yet another reason to use xsimd in favor of auto-vectorization
Yet another reason to use xsimd in favor of auto-vectorization
Yes, I agree.
Great. The problem with that is that this requires large scale change to our audio processing. Essentially every xsimd-related code needs to be templated on the vector instruction set (unless we want to dynamically dispatch everywhere). Also the binary size and build time will be increased because we're generating the same code for many architectures. The advantage would be that portable builds could include avx2, sse4, etc code so the official portable binaries can run almost as fast as native builds.
This is a compasion of the loops in sample.cpp The result of msvc is disaopinting: 13/39 loops vectorized
Did you call all these compilers with the same processor instruction set extensions as target?
This is the output from our default build settings on GitHub. That goes to our release builds as well. Maybe our MSVC build flags are bad?
Do you have better results in you local build?
Could you rerun your benchmark with /arch:AVX512
for MSVC (default is to use only SSE2 instruction set):
https://docs.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-170&viewFallbackFrom=vs-2019#auto-vectorizer
https://docs.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
Could you rerun your benchmark with /arch:AVX512 for MSVC (default is to use only SSE2 instruction set)
To provide portable builds, we can only distribute with SSE2. gcc only builds with sse2 and is able to a good job at autovectorization. So specifying a different target instruction set does not make sense. Especially not AVX512
which is only present on few modern high-end (usually desktop) CPUs.
I have testes the /arch:AVX512 flag here: https://github.com/daschuer/mixxx/runs/7462994413?check_suite_focus=true No improvement.
Thanks for testing! It seems, that MSVCs auto-vectorizer is that poor: http://0x80.pl/notesen/2021-02-17-autovectorization-msvc.html
@ferranpujolcamins Is this ready for merge?