Anaconda-Windows-AMD
Anaconda-Windows-AMD copied to clipboard
It works only with AMD Ryzen Threadripper 1950X ?
Hello, your repository has a description of the AMD Rizen Threadripper 1950X processor in "1950x_cpuinfo.txt". The patched libraries that you suggested work only with this CPU? Will they work with other AMD processors (for example, Risen 7 2700x)? Will I need to configure the file "cpuinfo.txt" for current cpu? And how to do it? And how config these perameters ? SET KMP_AFFINITY=granularity=core,compact,1,0 SET KMP_CPUINFO_FILE=cpuinfo.txt SET MKL_NUM_THREADS=16 SET OMP_NUM_THREADS=16
@medphisiker
You can edit 1950x_cpuinfo.txt
to fit 2700x's topology,
- delete processor 16-31, keep 0-15
- remove all
node_0 id : 0
- replace
node_1
tonode_0
It should present the 2700x's topology.
You can check the setting by adding verbose
to KMP_AFFINITY
and using Coreinfo to check matching or not.
Just type SET
command to command prompt to change environment variables once.
If you want change permanently, you can edit environment variables in control panel.
Thank you for the quick reply. The description of the repository also says that this is exactly "patched Intel MKL+compiler", it is not numpy based on OpenBLAS library ? And what do you think is the main reason for the poor performance of the MKL (stock) library on AMD processors? The main reason is that Intel's CPU has another topology in comparison to AMD processors? And by default MKL does not use all of AMD CPU's core and threads?
OpenBLAS on windows has poor performance with msvc compiler, and it is very tricky to build with mingw-w64.
OpenBLAS has performance issue on AMD zen arch, OpenBLAS still not optimized for zen https://github.com/xianyi/OpenBLAS/issues/1461
Anaconda haven't had nomkl
package can be install on windows, you can not change MKL to OpenBLAS easily on windows.
Intel's MKL check the CPUID is GenuineIntel
or not, if detected the non-intel cpu, MKL will choose the "maximum capability" code (i.e. SSE2 - slowset)
Intel's "cripple AMD" function
Anaconda's numpy use Intel TBB instead of OpenMP, Intel TBB use the intel's proprietary method to detect the CPU or NUMA topology, in this situation zen's SMT will be recognize to the real core, it hurt the ALU performance.
Thank you for the comprehensive answer. It was interesting to know. Perhaps other people who will watch your repository will find the information on configuring your libraries for their AMD processor useful. Just now the process is described as "use conda uninstall scikit-learn scipy numexpr numpy numpy-base --force -y to uninstall" cripple AMD "version and pip install patched package." And it seems that this is all that needs to be done. and thank you again for your work, your libraries and capacious answers.
Hi fo40225,
Thank you so much for your great job!
If I build Numpy with MKL using Intel compiler with mpopt = 'openmp' while running with KMP_CPU_INFO_FILE specified, should I get similar performance?
@xincui-math
Build numpy with icl and openmp and provide KMP_CPU_INFO_FILE
can fix the problem of cpu topology detection.
I didn't test the linking of dispatchpatch64.obj
from Agner's asmlib.zip
improve the speed or not.
If you want to build numpy, you can use this config. https://github.com/fo40225/Anaconda-Windows-AMD/blob/master/site.cfg