pnumpy icon indicating copy to clipboard operation
pnumpy copied to clipboard

need to work on compiler flags for vectorized code vs

Open tdimitri opened this issue 4 years ago • 2 comments

Need to experiment more with pragma and compiler targets. We want the loader code to be compiled normally. We want the avx2 (256bit instructions) code to be compiled with -avx2 (or equivalent). We want the 512 bit code to be compiled with other various compiler flags.

Ideally need an old mac from 8 years ago without 256 instruction to make sure it loads correctly.

tdimitri avatar Sep 29 '20 13:09 tdimitri

Is there concrete action that we should take here?

mattip avatar Oct 26 '20 14:10 mattip

I do not possess an old Mac to test (old enough that it does not have AVX2). My belief is this, when the compiler sees that -avx2 is allowed, it might change its memcpy or memset (or similar low level functions) to be faster using 256 bit instructions. And old computer might crash before it can load because something like C++ class creation hit an AVX2 memset. Therefore, we only want to enable AVX2 or AVX512 for specific functions as opposed to a global compile flag.

tdimitri avatar Oct 26 '20 15:10 tdimitri