mixxx icon indicating copy to clipboard operation
mixxx copied to clipboard

Build option for show vectorization info

Open daschuer opened this issue 2 years ago • 10 comments

With -DINFO_VECTORIZE=ON all target compiler will list vectorized loops in the terminal output Unfortunately enabling this rebuilds all files!

I have added this to compare the vectorization capability using the GitHub ci.

In this PR it is disabled, an enabled run can be found here: https://github.com/daschuer/mixxx/actions/runs/2702452852

daschuer avatar Jul 20 '22 14:07 daschuer

This is a compasion of the loops in sample.cpp The result of msvc is disaopinting: 13/39 loops vectorized

  Gcc 9.4 msvc clang Gcc 12.1
124 optimized optimized optimized optimized
145 optimized missed optimized optimized
153 optimized optimized optimized missed
169 optimized missed optimized optimized
189 optimized missed missed optimized
195 optimized missed missed optimized
205 optimized missed missed optimized
211 optimized missed missed optimized
222 optimized optimized optimized optimized
236 optimized missed optimized optimized
254 optimized missed optimized optimized
261 optimized optimized optimized optimized
236 optimized optimized optimized optimized
281 optimized optimized optimized optimized
304 optimized optimized optimized optimized
323 optimized optimized optimized optimized
352 optimized missed optimized optimized
359 optimized optimized optimized optimized
378 optimized optimized optimized optimized
391 optimized missed optimized optimized
407 optimized missed optimized optimized
433 optimized missed optimized optimized
444 optimized missed optimized optimized
456 optimized missed optimized optimized
477 optimized missed missed optimized
471 optimized missed missed optimized
492 optimized missed missed optimized
498 optimized missed missed optimized
512 optimized missed optimized optimized
522 optimized missed optimized optimized
533 optimized missed missed optimized
545 optimized missed optimized optimized
557 optimized missed optimized optimized
571 optimized missed missed optimized
585 optimized missed missed optimized
594 missed missed missed optimized
608 missed missed missed optimized

daschuer avatar Jul 21 '22 06:07 daschuer

The result of msvc is disaopinting: 13/39 loops vectorized

Yet another reason to use xsimd in favor of auto-vectorization

Swiftb0y avatar Jul 21 '22 14:07 Swiftb0y

Yet another reason to use xsimd in favor of auto-vectorization

Yes, I agree.

daschuer avatar Jul 21 '22 14:07 daschuer

Great. The problem with that is that this requires large scale change to our audio processing. Essentially every xsimd-related code needs to be templated on the vector instruction set (unless we want to dynamically dispatch everywhere). Also the binary size and build time will be increased because we're generating the same code for many architectures. The advantage would be that portable builds could include avx2, sse4, etc code so the official portable binaries can run almost as fast as native builds.

Swiftb0y avatar Jul 21 '22 14:07 Swiftb0y

This is a compasion of the loops in sample.cpp The result of msvc is disaopinting: 13/39 loops vectorized

Did you call all these compilers with the same processor instruction set extensions as target?

JoergAtGithub avatar Jul 21 '22 16:07 JoergAtGithub

This is the output from our default build settings on GitHub. That goes to our release builds as well. Maybe our MSVC build flags are bad?

Do you have better results in you local build?

daschuer avatar Jul 21 '22 17:07 daschuer

Could you rerun your benchmark with /arch:AVX512 for MSVC (default is to use only SSE2 instruction set):

https://docs.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-170&viewFallbackFrom=vs-2019#auto-vectorizer

https://docs.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

JoergAtGithub avatar Jul 21 '22 17:07 JoergAtGithub

Could you rerun your benchmark with /arch:AVX512 for MSVC (default is to use only SSE2 instruction set)

To provide portable builds, we can only distribute with SSE2. gcc only builds with sse2 and is able to a good job at autovectorization. So specifying a different target instruction set does not make sense. Especially not AVX512 which is only present on few modern high-end (usually desktop) CPUs.

Swiftb0y avatar Jul 21 '22 17:07 Swiftb0y

I have testes the /arch:AVX512 flag here: https://github.com/daschuer/mixxx/runs/7462994413?check_suite_focus=true No improvement.

daschuer avatar Jul 22 '22 10:07 daschuer

Thanks for testing! It seems, that MSVCs auto-vectorizer is that poor: http://0x80.pl/notesen/2021-02-17-autovectorization-msvc.html

JoergAtGithub avatar Jul 22 '22 16:07 JoergAtGithub

@ferranpujolcamins Is this ready for merge?

daschuer avatar Aug 25 '22 06:08 daschuer