Gianluca Frison
Gianluca Frison
Hi, the issue with undefined reference to `kernel_dpack_buffer_xx` has been fixed in the current master branch about one month ago https://github.com/giaf/blasfeo/commit/a759241e98769bf94db395120c54f061925f8606
Yep this was exactly the idea of the run_time_checks, but so far it has been implemented only in very few routines.
Hi, where did you see that only the 4x4 microkernel can be used? Actually, in the BLAS_API sgemm algorithm for ARM Cortex A53 and A57, the 8x8 kernel is implemented...
In general I don't expect code side to affect the performance of small GEMMs much, at least once it is loaded in instruction cache, in case of multiple calls to...
Then for small matrices it may be that the overhead of loading data and code from main memory is the limiting factor. But it is difficult to say a priori,...
@hfp thanks for sharing the link to your issue, interesting reading!
IMO it is, as this is a rather common case in practice. In many cases, the (small) matrices are already in cache as the result of some previous operation, and...
Same applies e.g. to gecp :p
Thanks for the suggestion. In general, I am against adding the `const` qualifier to the interfaces, for these reasons: - in the BLAS API, there are not, since this was...
Yes you are correct, all 16 versions of `dtrmm` are implemented for the reference back end (so they are there is you choose `LA=REFERENCE`), and they are also all implemented...