coder(anonymous) comments

Results 18 comments of


                                            coder(anonymous)

trafficstars

blas_ API: for sgemm of armv8a, only 4x4 microkernel can be used?

Does code size affect the performance of small GEMMs?

blas_ API: for sgemm of armv8a, only 4x4 microkernel can be used?

What if it runs only once?

blas_ API: for sgemm of armv8a, only 4x4 microkernel can be used?

Thank you for your previous reply. I would like to ask: is it reasonable to run multiple times and average the performance of small-scale GEMM?

BLASFEO: Basic Linear Algebra Subroutines for Embedded Optimization

The program uses the blasfeo_pack_smat API to generate segmentation fault. What is the cause of this problem? M is not equal to N.

For multithreaded GEMM in OpenBLAS, do I need to call API in the program to ensure thread affinity?

openblas_set_num_threads() Has this API ensured thread affinity?

For multithreaded GEMM in OpenBLAS, do I need to call API in the program to ensure thread affinity?

No, thanks for your reply.

How to use mnemonics to express "prfm pldl1keep, [x1, #256]".

Is there no overhead in using JIT to generate code? Why can it improve the performance of deep learning training?

How to use mnemonics to express "prfm pldl1keep, [x1, #256]".

Thank you for your answer. I would like to ask if there is any way to insert a hexadecimal code into the generated code? This code has been written by...

How to use mnemonics to express "prfm pldl1keep, [x1, #256]".

Thank you for your answer. Does this mean that I can express it as: top_[size_++]=0xb000020;

How to use mnemonics to express "prfm pldl1keep, [x1, #256]".

How to pass runtime parameters to getCode?