kokkos-kernels icon indicating copy to clipboard operation
kokkos-kernels copied to clipboard

Optimize TPL interface to rocsparse spmv by caching variables for repeated use

Open jczhang07 opened this issue 2 years ago • 18 comments

Caching the temp variables is key to performance of repeated SpMV. I tested on one Crusher node with petsc microbenchmark (src/mat/tests/ex1k).

Average SpMV time (us) with matrix HV15R (of size 2M x 2M).

  1 rank 8 ranks
TPL=OFF 3743 636
TPL=ON 14948 3713
TPL=ON + this PR 2686 571

@lucbv I hope to make A.roc_spmv_helper a private member, but I don't know how to access it in spmv_rocsparse() then.

jczhang07 avatar Jul 15 '22 17:07 jczhang07

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Jul 15 '22 17:07 kokkos-devops-admin

@jczhang07 Is this for several calls to SpMV? Is the caching basically avoiding some expensive allocations / initializations?

srajama1 avatar Jul 16 '22 09:07 srajama1

Yes, for repeated calls to SpMV as commonly seen in Krylov methods.

jczhang07 avatar Jul 16 '22 14:07 jczhang07

This makes sense although I would like to think about all the TPLs together and try to come up with an implementation that takes care of all of this for cusparse, rocsparse, onemkl, etc...

lucbv avatar Jul 16 '22 14:07 lucbv

That sounds reasonable. But each TPL has its own APIs. In the end, we have to have separate implementations.

I suggest we could add standalone TPLs as we go, then we might come up with better ideas to consolidate them.

jczhang07 avatar Jul 16 '22 15:07 jczhang07

Correct, I think handling the TPLs can be done in two parts

  1. augment the CrsMatrix to hold additional data generated by the TPL
  2. keep the individual kernel implementations for each TPL

lucbv avatar Jul 16 '22 21:07 lucbv

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Sep 07 '22 03:09 kokkos-devops-admin

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Sep 13 '22 19:09 kokkos-devops-admin

@lucbv Testing pipeline is clean. Any comment on merging it?

jczhang07 avatar Sep 13 '22 21:09 jczhang07

@lucbv @jczhang07 Hi guys, is there an update on this? I have some overlapping changes that I would like to put in to help support rocsparse in Tpetra, but it would be nice to use this as a starting point (my existing work ignores cusparse).

seanofthemillers avatar Oct 12 '22 17:10 seanofthemillers

@lucbv @jczhang07 Hi guys, is there an update on this? I have some overlapping changes that I would like to put in to help support rocsparse in Tpetra, but it would be nice to use this as a starting point (my existing work ignores cusparse).

@seanofthemillers On my end, I just need reviewers and approvals to get it merged :)

jczhang07 avatar Oct 12 '22 17:10 jczhang07

@jczhang07 @seanofthemillers We are okay with this PR in general but we would like to save the temporary data in the KokkosKernels handle or the KokkosKernels controls instead of having these cached into a new object. We may want to add rocsparse matrix descriptor directly to the CrsMatrix in a similar fashion to the current cusparse matrix descriptor.

I can start working on these changes or if one of you want to start that's also fine with me : )

lucbv avatar Oct 12 '22 17:10 lucbv

@lucbv I had problems in my implementation with adding the raw descriptor to the CrsMatrix. In an application like Tpetra, these KokkosSparse::CrsMatrix objects are generated on the fly (i.e. not stored and re-used), so we would have an issue where the KokkosSparse::CrsMatrix would destroy the descriptor on destruction. The Tpetra design implies that the Tpetra::CrsMatrix should own the descriptor, but it would be nice to automate the generation of the descriptors via kokkos-kernels to keep raw rocsparse/cusparse code out of Tpetra. The shared_ptr method used in this PR gives us a nice workaround.

seanofthemillers avatar Oct 12 '22 18:10 seanofthemillers

@seanofthemillers the shared pointer can be stored in the handle or controls of Kokkos Kernels then, although I have to say having Tpetra destroy and create the matrix on the fly is a but annoying even if I understand the idea. We are just not keen on creating a new data structure when we already have two for similar purpose in the library.

I think Tpetra might be destroying our controls and handles after performing the local kernel but that's really a problem for them to figure out.

lucbv avatar Oct 12 '22 18:10 lucbv

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Oct 28 '23 19:10 kokkos-devops-admin

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

kokkos-devops-admin avatar Feb 19 '24 21:02 kokkos-devops-admin

@lucbv I forgot this MR hadn't been merged yet until I tested a SpMV benchmark on Frontier recently and saw the terrible performance again (https://github.com/kokkos/kokkos-kernels/pull/1469#issue-1306297831).

I rebased the MR off KK/develop. Could you have a look and get it merged?

jczhang07 avatar Feb 20 '24 17:02 jczhang07

@brian-kelley Could you test this PR in CI?

jczhang07 avatar Feb 21 '24 20:02 jczhang07