Christopher Sidebottom
Christopher Sidebottom
Maybe I missed it, but how much worse is the performance with the `return 0` permit compared to `0.3.27`? There has to be some overhead for small GEMM, which the...
Thanks, @dnoan. This is very helpful. It's interesting to see the ~5% drop when just enabling small matrix optimisation. When calling dgemm, what are your transpose settings? I'd like to...
> In this case transpose is TN and NN. Thanks! I've put some experimental ASIMD kernels in #4963. Can you give them a try and see whether they perform better?...
I've commented out the SVE kernels and added ASIMD ones for `ARMV8SVE` target here: https://github.com/OpenMathLib/OpenBLAS/pull/4963/files#diff-fb8d5777e1eda9b02b7e71a3fdcba2f0fe05ac790c0cc9437fda31ba8e71b12e And added them explicitly to `NEOVERSEV2` target here: https://github.com/OpenMathLib/OpenBLAS/pull/4963/files#diff-668e5b5408832c20baf0ef0798427647b961a3cc34e9b38e7cb653b5b209a39e There is potential I got this...
@dnoan, what compiler are you using? If it's an older compiler or mis-detected, it might not enable the SVE platforms.
> Thank you for the detailed info. Could you please help explain why it works without the use of specialized allocators like https://en.cppreference.com/w/c/memory/aligned_alloc ? It's important for us to know...
> The CMake logic looks right. It only compiles SVE code when the compiler supports it and during the runtime it triggers the SVE code only when the hardware supports...
Hi @luyahan, Have you measured if this changes/increases performance? Would be good to see some benchmarks 😸 Just a process point, I assume https://github.com/google/highway/pull/2116 needs to be released before this...
We want to make `OPENBLAS_VERSION` configurable in https://github.com/pytorch/pytorch/pull/150106/files, can we merge that first?
@r-devulap sorry it took awhile to get back to this, I did try hoisting the variables out in https://github.com/numpy/numpy/commit/22c19d27b84fdafd6ba377df52f25c4122867c3a but that broke with old compilers and musl for some reason...