Vinh Dang
Vinh Dang
@bandokihiro I would like to understand your use case more: - "view of the N left square matrices" means view with LayoutLeft? ```Kokkos::View A ( "A", m, m, N );```...
@bandokihiro Oh, I misunderstood the words "left" and "right". Thanks for clarifying. In our evaluations, we found that ```TeamGemm``` with ```Unblocked``` algorithm gives the best performance on LayoutRight views with...
Yes, use a view C of the same shape as B. It cost extra deep_copy from C to B but you can use our optimized host-level interface, which might give...
Just to be sure: do you have to use ```B``` to store the results or using another view ```C``` for results is okay in your code?
@bandokihiro Can you provide me an example code of your problem? I will look at it and see if I can improve anything.
@bandokihiro Thanks for the code. Can you also tell me what values of ne, nb, ns? and what is your achieved flops?
Thanks, @bandokihiro . I will look at this.
@bandokihiro I did some experiments and here are my results on V100 GPU in the attached Excel file. [batchedgemm_test.xlsx](https://github.com/kokkos/kokkos-kernels/files/7540677/batchedgemm_test.xlsx) There are two sheets in the file (all A, B and...
@bandokihiro > "A separate view should be used for storing results" are you referring to the different-layout-case when you claim this? I meant both different-layout case and same-layout case. >...
@e10harvey Thanks for adding the CMake for ARMPL. I would like to add two comments: 1. For ARMPL's BLAS, could you please also enable ```KOKKOSKERNELS_ENABLE_TPL_BLAS``` in the ```KokkosKernels_config.h``` when ```KOKKOSKERNELS_ENABLE_TPL_ARMPL```...