stdBLAS icon indicating copy to clipboard operation
stdBLAS copied to clipboard

NGA FY22: extend std::blas interface to KokkosKernels

Open fnrizzi opened this issue 4 years ago • 1 comments

  • [x] Use the tpl-customization point interface to implement possible mappings to KokkosKernels
  • [ ] Add missing kernels to KokkosKernels for BLAS 1/2/3
  • [ ] (If time permits) Add tpl-customization point implementation for standard BLAS

fnrizzi avatar Sep 29 '21 07:09 fnrizzi

List below is used to track progress:

Rules for making PRs

  • we first make PRs for the algorithms impl ONLY
  • after we have all blas1 algorithms merged, we post a single PR to update the linalg_kokkoskernels include header and the Cmakelist under tests/kokkos-based
  • we do the same for blas2 and blas3

This allows all the implementations to be merged independently without having conflicts popping up all the time

BLAS 1

  • dot
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [x] example code link
    • [x] kokkos-test
  • dotc
    • [x] has three overloads
    • [x] customization point
    • [x] impl forwards to dot after conjugating
    • [x] example code link
    • [x] kokkos-test
  • add
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [x] example code link
    • [x] kokkos-test
  • scale (started by CTrott)
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [x] example code link
    • [x] kokkos-test
  • idx_abs_max: (keep an eye on #114)
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [x] example code link
    • [x] kokkos-test
  • vector_abs_sum
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • vector_sum_of_squares
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • vector_norm2: init is inside sqrt as per spec, this is different from what the default impl does this issue.
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • givens_rotation_apply (need to sync with Luc about this since he started something similar alrady for KK)
    • [x] has three overloads
    • [x] customization point
    • [ ] KK impl
    • [ ] example code
    • [ ] kokkos-test

The following operate on matrices but are in BLAS 1, see related #107

  • matrix_frob_norm
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • matrix_one_norm
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • matrix_inf_norm
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test

Defined in BLAS 1, buy accepting also rank-2

These functions are here because while they are defined in BLAS-1, they accept both rank-1 and rank-2 operands. See related #106

  • copy
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • swap_elements
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test

BLAS 2

product

  • matrix_vector_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_vector_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_vector_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • triangular_matrix_vector_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test

solve and update

  • triangular_matrix_vector_solve (in review: #194)
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • matrix_rank_1_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • matrix_rank_1_update_c
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_rank_1_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_rank_1_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_rank_2_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_rank_2_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test

BLAS 3

  • [ ] matrix_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
    • [ ] complete the updating overloads following the overwriting ones
  • triangular_matrix_left_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • triangular_matrix_right_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_left_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_right_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_left_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_right_product
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • triangular_matrix_matrix_solve (only dispatches between left and right - has no own implementation)
    • [x] has three overloads
    • [x] customization point
    • ~KK impl~
    • [ ] example code
    • ~kokkos-test~
  • triangular_matrix_matrix_left_solve
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • triangular_matrix_matrix_right_solve
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_rank_2k_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_rank_2k_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • symmetric_matrix_rank_k_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test
  • hermitian_matrix_rank_k_update
    • [x] has three overloads
    • [x] customization point
    • [x] KK impl
    • [ ] example code
    • [x] kokkos-test

Notes

By "three overloads", we mean:

void fnc(/* no exec policy, other args */)
void fnc(/* generic ex pol, other args */)
void fnc(/* inline_exec_t, other args */)

fnrizzi avatar Oct 11 '21 10:10 fnrizzi