OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

AXPY looks bad especially on MacOS (M4)

Open dasergatskov opened this issue 6 months ago • 5 comments

I am doing some benchmarking on 2d convolution in octave and e.g. for simple benchmark like that:

r = ones (1, 5e4);
tic;  x1 = conv  (r, r);  time_row_conv  = toc

On MacOS (M4) the timing for OpenBLAS is 3.66 s), and for APPLE veclib it is 0.1 s. On x86_64 linux (Ryzen 3950x) it is also a couple seconds (and pretty much the same as NETLIB). I will try to get some other Blas on it eventually to compare.

The conv code essentially is:

    const F77_INT len = ma - mb + 1;  // Pre-calculate this value to avoid temporary
    for (F77_INT k = 0; k < na - nb + 1; k++) {
      for (F77_INT j = 0; j < nb; j++) {
        for (F77_INT i = 0; i < mb; i++) {
          double b_val = b[i + j*mb];
          daxpy_(&len, &b_val, &a[mb-i-1 + (k+nb-j-1)*ma], &one, &c[k*len], &one);
        }
      }
    }

and profiler shows that it all dominated by daxpy calls.

dasergatskov avatar Apr 16 '25 17:04 dasergatskov