OpenBLAS
OpenBLAS copied to clipboard
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
The [recently added](https://github.com/xianyi/OpenBLAS/pull/1890) version number availability for `openblas_get_config()` is [already proving useful](https://github.com/numpy/numpy/pull/12523) in our project to verify binaries built via linking to openblas (i.e., make sure an old system-level openblas...
After toying around with `Makefile.power` and `Makefile.system` for a while, I've successfully built OpenBLAS 0.3.10 on POWER9 at Summit (ORNL) with GCC 6.4.0 (the default GCC version, at the time...
Hi, Following issue 2693 [https://github.com/xianyi/OpenBLAS/pull/2693] by @EGuesnet, once fixed by our patch , we have found another issue when tests are run during build phase, still with Power8/32bit, thus using...
My app performs many small dgemms, each invoked by a separate thread (via a task pool). As recommended I compiled OpenBlas 3.10 with USE_THREAD=0 and USE_LOCKING=1. This is on Cavium...
Dear developers, thank you for your great work on openBLAS. using it on ARM 32 bit platforms and Ubuntu 14.04, we found some erroneous results used with Torch7: The Lua...
Hi Xianyi, We tried to run a matrix multiplication with cblas_sgemm or cblas_dgemm on android. We tried with A = [1 3 4 6], B = [3 5 9 1],...
https://github.com/xianyi/OpenBLAS/wiki/Faq/4bded95e8dc8aadc70ce65267d1093ca7bdefc4c#multi-threaded says: > If your application is already multi-threaded, it will conflict with OpenBLAS multi-threading. Thus, you must set OpenBLAS to use single thread as following ... That is good...
Hi, fellows! I benchmarked sgemm performance on [email protected] x 24Cores (square matrix from 128 to 4096 ). There seems to be a gap between the best case (28.3 GFLOPS) and...
Hi, I've got three issues when compiling OpenBLAS. 1) When compiling using `TARGET=ARMV8` or more explicitly `TARGET=CORTEXA72`, this will lead to a 64-bit binary compilation. The Raspberry Pi 4 with...
Hi Xianyi, Came across the following article. https://www.codeproject.com/Articles/1169319/Reducing-Packing-Overhead-in-Matrix-Matrix-Multipl This talks about introducing new packed APIs of the following form in MKL. `dest = sgemm_alloc (identifier, m, n, k)` `sgemm_pack (identifier,...