Vahid Tavanashad
Vahid Tavanashad
Timing for calling `gemm` in different scenarios * dpnp timing on Intel Xeon is slower than NumPy but it is independent of this PR, and should be addressed separately. |...
Timing for calling `gemm_batch` in different scenarios * dpnp timing for cases when one input is c-contiguous and the other one is f-contiguous is not better than old version on...
Timing for calling `gemm_batch` in different scenarios. * size on Iris Xe and Intel Core (2000, 2000, 4, 4) * size on PVC and Intel Xeon (4095, 4095, 4, 4)...