kokkos-kernels
kokkos-kernels copied to clipboard
ArmPL TPL support for KokkosBlas and KokkosBatched
- [x] Check for ArmPL version 21 TPL installations
- [x] Add CMake TPL support. (@e10harvey) - #880
- [x] Add KokkosBlas TPL support: gemm, iamax, scal, copy (Kokkos copy). (@vqd8a) - #880
- [ ] Collect performance in Adelus
- [x] gemm
- [ ] iamax
- [ ] scal
- [ ] copy
- [ ] Collect performance in Adelus
- [ ] Add KokkosBatched TPL support to cover routines used by https://github.com/kokkos/kokkos-kernels/blob/master/perf_test/batched/KokkosBatched_Test_BlockTridiagDirect.cpp. (@e10harvey, @vqd8a)
- [ ] LU (@vqd8a)
- [ ] TRSM (@vqd8a)
- [x] GEMM (@e10harvey) - #1256
- [ ] TRSV (@e10harvey)
- [ ] GEMV (@e10harvey)
- [ ] Collect BlockTridiagDirect performance in KokkosKernels
@e10harvey Thanks for adding the CMake for ARMPL. I would like to add two comments:
- For ARMPL's BLAS, could you please also enable
KOKKOSKERNELS_ENABLE_TPL_BLAS
in theKokkosKernels_config.h
whenKOKKOSKERNELS_ENABLE_TPL_ARMPL
is defined, so that we can use the current BLAS TPL support in Kokkos Kernels? - It looks to me that single-threaded ARMPL (
libarmpl.so
) can be found
=======================
KokkosKernels ETI Types
Devices: <OpenMP,HostSpace>
Scalars: double
Ordinals: int
Offsets: int;size_t
Layouts: LayoutLeft
KokkosKernels TPLs
ARMPL: /opt/arm/armpl-20.3.0_A64FX_RHEL-8_gcc_aarch64-linux/lib/libamath.so;/opt/arm/armpl-20.3.0_A64FX_RHEL-8_gcc_aarch64-linux/lib/libarmpl.so
=======================
Can we also find multi-threaded ARMPL (libarmpl_mp.so
) when OpenMP is enabled?
1. For ARMPL's BLAS, could you please also enable `KOKKOSKERNELS_ENABLE_TPL_BLAS` in the `KokkosKernels_config.h` when `KOKKOSKERNELS_ENABLE_TPL_ARMPL` is defined, so that we can use the current BLAS TPL support in Kokkos Kernels?
Yes.
Can we also find multi-threaded ARMPL (
libarmpl_mp.so
) when OpenMP is enabled?
Yes.
I will flag you in the PR for this.
@e10harvey Thanks, Evan.