OpenBLAS
OpenBLAS copied to clipboard
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
All I know is that this builds and works fine with clangarm64 on my laptop. Unsure about performance improvement, but certainly no performance regression. I am not an assembly wizard,...
How to optimize when use zgemm(T,N,m,n,k),m=n100000,lda=ldb=ldc=k
The check for GCC is confused by the GNU-stack in ``` .file "FIRModule" .text .globl zhoge_ .p2align 4 .type zhoge_,@function zhoge_: xorps %xmm0, %xmm0 xorps %xmm1, %xmm1 retq .Lfunc_end0: .size...
I had a brief bit of confusion because the docs suggested that `OMP_NUM_THREADS` would only affect OpenMP builds, when actually it's used as a fallback in non-OpenMP builds as well....
Very large numbers of calls to the symmetric complex eigenproblem via numpy.linalg.eigh() can have dramatic slowdowns due to multi-threading, especially when the number of cores is large and when there...
This PR adds the thread thresholding for Power10 by introducing get_gemv_optimal_nthreads_power10 function.
environment: operating system: windows 10. IDE: visual studio 2022 community. opencv version opencv-4.12.0. i use cmake gui build opencv-4.12.0, using the binary openblas0.30.0, cmake unsurport this version. so i build...
Building commit d23680b81d5179ce6ae1ca5546303b81646ecac1 with `make -j DYNAMIC_ARCH=1 TARGET=VORTEX` results in test failures on Apple M4: ``` TEST 1135/1522 sgemmt:c_api_rowmajor_upper_M_50_K_50_a_notrans_b_notrans [FAIL] ERR: test_extensions/test_sgemmt.c:797 expected 0.000e+00, got 2.741e-01 (diff -2.741e-01, tol 1.000e-04)...
Forwarded bug report from Julia: https://github.com/JuliaLang/LinearAlgebra.jl/issues/1463 The error we get in SNRM2 is much higher than it should when computed on Apple ARM64. We tried the same thing on AMD...