OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Results 242 OpenBLAS issues
Sort by recently updated
recently updated
newest added

I have built openblas in graviton3E with make USE_OPENMP=1 NUM_THREADS=256 TARGET=NEOVERSEV1. mkl is built in icelake machine. I have used openblas sgemm as `cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, M, N, K, 1.0,...

For optimization purposes, such as in gemv, can I use stack allocation for the thread buffer in gemm for smaller matrix sizes? If yes, what should be the buffer size?...

PR #4577 > We have introduced adjust_thread_buffers() function, similar to OpenMP, for initializing global thread buffers instead of the existing local buffers initialized in blas_thread_server. In` blas_thread_init`, memory is allocated...

Hello, We are using NVIDIA Jetson Orin platform and in a multi-process system, each process is assigned to a specific cpu and parallelizes cblas_sgem() sequentially through the remaining cores. When...

Hello, I'm currently working on optimizing the scalability of the openBLAS Pthread flow. Presently, I've observed that even when a BLAS call requires only 8 threads for execution on a...

OpenBLAS DGEMM achieves high efficiency, for example, over 90% of peak performance with 1 thread on Graviton3E, but the efficiency drops to about 73% when running DGEMM with 64 threads....

`?GEADD` has virtually no utility over `?AXPBY`, which is itself unoptimizable (i.e. the equivalent loops, optimized by a compiler, will perform at least as well in all cases). Both `appleblas_?geadd`...

Feature request

Hello openblas uses operating system-related methods (parsing /proc/cpuinfo) and architecture-related methods (x86 cpuid) to obtain the isa extension information of the cpu at runtime and dynamically select the optimized code...