OpenBLAS icon indicating copy to clipboard operation
OpenBLAS copied to clipboard

Splitting gotoblas_t into parameters and kernels to re-use kernels between dynamic targets

Open Mousius opened this issue 1 year ago • 2 comments

This would greatly help with the competing demand of re-using our kernels between different cores with different parameters whilst not overly bloating the dynamic binary.

Looking at gotoblas_t (https://github.com/OpenMathLib/OpenBLAS/blob/develop/common_param.h#L1211), there are two parts, parameters such as:

  int sgemm_p, sgemm_q, sgemm_r;
  int sgemm_unroll_m, sgemm_unroll_n, sgemm_unroll_mn;

And function pointers, such as:

  int    (*sgemm_kernel   )(BLASLONG, BLASLONG, BLASLONG, float, float *, float *, float *, BLASLONG);
  int    (*sgemm_beta     )(BLASLONG, BLASLONG, BLASLONG, float, float *, BLASLONG, float *, BLASLONG, float  *, BLASLONG);

The parameters take up far less space than all of the compiled kernels, so I'm proposing splitting gotoblas_t into openblas_kernels and openblas_params data structures. That would allow our dynamic logic to do something like this:

case NEOVERSEV1:
   openblas_kernels = openblas_kernels_ARMV8SVE;
   openblas_params = openblas_params_NEOVERSEV1;

This allows sensible defaults (such as the minimum cache size for a particular core should it not be queriable dynamically) without duplicating the kernels multiple times.

We can mark these with DYNAMIC_KERNELS and DYNAMIC_PARAMS in the Makefile, DYNAMIC_LIST would build both for all and DYNAMIC_ARCH would be our current favourites.

@martin-frbg, what do you think?

Mousius avatar Jan 19 '24 19:01 Mousius

At first glance this looks a bit invasive for something that is only used on one architecture (for now at least). I wonder if similar could be achieved via a small parameter table in dynamic_arm64.c itself ?

martin-frbg avatar Jan 19 '24 21:01 martin-frbg

@martin-frbg my concern with that is that the source of truth is no longer param.h?

Mousius avatar Jan 22 '24 22:01 Mousius