Anaconda-Windows-AMD icon indicating copy to clipboard operation
Anaconda-Windows-AMD copied to clipboard

It works only with AMD Ryzen Threadripper 1950X ?

Open medphisiker opened this issue 6 years ago • 6 comments

Hello, your repository has a description of the AMD Rizen Threadripper 1950X processor in "1950x_cpuinfo.txt". The patched libraries that you suggested work only with this CPU? Will they work with other AMD processors (for example, Risen 7 2700x)? Will I need to configure the file "cpuinfo.txt" for current cpu? And how to do it? And how config these perameters ? SET KMP_AFFINITY=granularity=core,compact,1,0 SET KMP_CPUINFO_FILE=cpuinfo.txt SET MKL_NUM_THREADS=16 SET OMP_NUM_THREADS=16

medphisiker avatar Jan 25 '19 03:01 medphisiker

@medphisiker

You can edit 1950x_cpuinfo.txt to fit 2700x's topology,

  1. delete processor 16-31, keep 0-15
  2. remove all node_0 id : 0
  3. replace node_1 to node_0

It should present the 2700x's topology.

You can check the setting by adding verbose to KMP_AFFINITY and using Coreinfo to check matching or not.

Just type SET command to command prompt to change environment variables once.

If you want change permanently, you can edit environment variables in control panel.

fo40225 avatar Jan 25 '19 14:01 fo40225

Thank you for the quick reply. The description of the repository also says that this is exactly "patched Intel MKL+compiler", it is not numpy based on OpenBLAS library ? And what do you think is the main reason for the poor performance of the MKL (stock) library on AMD processors? The main reason is that Intel's CPU has another topology in comparison to AMD processors? And by default MKL does not use all of AMD CPU's core and threads?

medphisiker avatar Jan 30 '19 13:01 medphisiker

OpenBLAS on windows has poor performance with msvc compiler, and it is very tricky to build with mingw-w64.

OpenBLAS has performance issue on AMD zen arch, OpenBLAS still not optimized for zen https://github.com/xianyi/OpenBLAS/issues/1461

Anaconda haven't had nomkl package can be install on windows, you can not change MKL to OpenBLAS easily on windows.

Intel's MKL check the CPUID is GenuineIntel or not, if detected the non-intel cpu, MKL will choose the "maximum capability" code (i.e. SSE2 - slowset)

Intel's "cripple AMD" function

Anaconda's numpy use Intel TBB instead of OpenMP, Intel TBB use the intel's proprietary method to detect the CPU or NUMA topology, in this situation zen's SMT will be recognize to the real core, it hurt the ALU performance.

fo40225 avatar Jan 30 '19 16:01 fo40225

Thank you for the comprehensive answer. It was interesting to know. Perhaps other people who will watch your repository will find the information on configuring your libraries for their AMD processor useful. Just now the process is described as "use conda uninstall scikit-learn scipy numexpr numpy numpy-base --force -y to uninstall" cripple AMD "version and pip install patched package." And it seems that this is all that needs to be done. and thank you again for your work, your libraries and capacious answers.

medphisiker avatar Feb 02 '19 12:02 medphisiker

Hi fo40225,

Thank you so much for your great job!

If I build Numpy with MKL using Intel compiler with mpopt = 'openmp' while running with KMP_CPU_INFO_FILE specified, should I get similar performance?

xincui-math avatar Jun 14 '19 02:06 xincui-math

@xincui-math

Build numpy with icl and openmp and provide KMP_CPU_INFO_FILE can fix the problem of cpu topology detection.

I didn't test the linking of dispatchpatch64.obj from Agner's asmlib.zip improve the speed or not.

If you want to build numpy, you can use this config. https://github.com/fo40225/Anaconda-Windows-AMD/blob/master/site.cfg

fo40225 avatar Jun 14 '19 03:06 fo40225