cgt icon indicating copy to clipboard operation
cgt copied to clipboard

BLAS name collisions

Open joschu opened this issue 8 years ago • 5 comments

CGT downloads and installs OpenBLAS. But then when numpy gets imported, the functions from the linked BLAS (e.g. cblas_dgemm) overload some of the functions from OpenBLAS. I noticed this when I found that setting VECLIB_MAXIMUM_THREADS changes the behavior of CGT's matrix multiplication. This behavior doesn't seem to cause any serious bugs, but it partly defeats the purpose of using OpenBLAS, which is to obtain consistent behavior with regard to multithreading and so forth.

joschu avatar Aug 28 '15 03:08 joschu

(Question from an outsider:) Can't you just link against whatever BLAS library is installed on the system (that will most often be the one that is also used by numpy), and recommend users to install OpenBLAS?

f0k avatar Aug 28 '15 10:08 f0k

That's what's typically done, e.g. by Numpy and Theano. And maybe CGT should do that, at least for now. A few reasons for the choice to download OpenBLAS:

  • different BLAS implementations have different multi-threading settings. If CGT is using multi-threading, we want to ensure that BLAS is using a single thread. This is a significant effect -- I've noticed substantial speedups by setting VECLIB_MAXIMUM_THREADS=1 on mac when CGT is using parallelism. It's easier for the user if we just use a bundled BLAS and let CGT handle the threading configuration.
  • most people on linux are using a BLAS that's less performant than OpenBLAS
  • Properly configuring a library to find and use the right BLAS is surprisingly hard. NumPy still doesn't do a great job at it--it's rather painful to get NumPy to use your chosen BLAS and lots of people are using the fallback built-in one.

joschu avatar Aug 28 '15 21:08 joschu

Julia adds a suffix to the names of BLAS (they do it so that 32-bit and 64-bit BLAS routines don't get confused), a smililar patch could be used to fix the problem for CGT, too:

https://github.com/JuliaLang/julia/commit/066825ebb3d450ccd1315122d1fd0e473f91798e

pcmoritz avatar Aug 30 '15 08:08 pcmoritz

Cool, nice find! It definitely might make sense to customize the makefile, also to make sure we're not building functions we don't need (most of level 3)

joschu avatar Aug 30 '15 16:08 joschu

A few reasons for the choice to download OpenBLAS:

Agreed, all of these are valid reasons to make OpenBLAS be the default for novice users. If it's not too difficult, advanced users could still be given a way to use an alternative BLAS library, taking care of disabling multi-threading themselves.

Properly configuring a library to find and use the right BLAS is surprisingly hard. NumPy still doesn't do a great job at it--it's rather painful to get NumPy to use your chosen BLAS and lots of people are using the fallback built-in one.

Regarding the last point, at least on Ubuntu (and probably Debian) it's quite simple. When you're using packages from the Ubuntu repository, you can even use update-alternatives to switch between different installed BLAS libraries to be used by numpy and others, without recompiling anything. But sure, as a library developer it's hard to automatically find the correct BLAS library across multiple platforms.

It definitely might make sense to customize the makefile

What if I want to or need to customize the OpenBLAS build myself, e.g., because the architecture auto-detection does not work for my CPU? Even if you decide to only support OpenBLAS, you may want to allow users to point CGT at an OpenBLAS build somewhere on their system. (This means if you add a function name suffix, that one should be configurable as well.)

(Note that I'm just depicting what I'd like as a user here, without understanding the implications on the development side.)

f0k avatar Sep 01 '15 10:09 f0k