kokkos-kernels
kokkos-kernels copied to clipboard
SPMV Tuning as per latest Kokkos Kernels develop
Mainly provided so that Luc can test this out on a HIP platform
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging WARNING: NO REVIEWERS HAVE BEEN REQUESTED FOR THIS PULL REQUEST!
Some V100 results:
[dzpolia@kokkos-dev-2 cuda-build]$ KOKKOS_PROFILE_LIBRARY=$HOME/src/apollo/build/src/libapollo.so ./perf_test/sparse/sparse_spmv --test kk -l 15000 -s 1000000
Initializing Apollo Tuning adapter
== APOLLO: Looked for APOLLO_SINGLE_MODEL with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_MEASURES with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_NUM_POLICIES with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_FLUSH_PERIOD with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_POLICY with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_RETRAIN with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_ALLGATHER with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_RETRAIN_TIME_THRESHOLD with getenv(), found nothing, using '2.0' (default) instead.
== APOLLO: Looked for APOLLO_RETRAIN_REGION_THRESHOLD with getenv(), found nothing, using '0.5' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_CSV with getenv(), found nothing, using '0' (default) instead.
== APOLLO: Looked for APOLLO_TRACE_CSV_FOLDER_SUFFIX with getenv(), found nothing, using '' (default) instead.
Building a tuner
== APOLLO: Loading the requested DecisionTree:
== APOLLO: dtree-step-0-rank-0-kokkos.kernels.spmv.nnz_kokkos.kernels.spmv.rows_kokkos.kernels.yaml
NNZ NumRows NumCols ProblemSize(MB) AveBandwidth(GB/s) MinBandwidth(GB/s) MaxBandwidth(GB/s) AveGFlop MinGFlop MaxGFlop aveTime(ms) maxTime(ms) minTime(ms) numErrors
10000000 1000000 1000000 133.51 ( 378.75 155.36 380.78 ) ( 38.542 15.810 38.749 ) ( 0.519 1.265 0.516 ) 0 RESULT
Kokkos::MultiVector Test: Passed
Finalizing Apollo Tuning adapter
Apollo: total region executions: 15001
[dzpolia@kokkos-dev-2 cuda-build]$ ./perf_test/sparse/sparse_spmv --test kk -l 15000 -s 1000000
Building a tuner
NNZ NumRows NumCols ProblemSize(MB) AveBandwidth(GB/s) MinBandwidth(GB/s) MaxBandwidth(GB/s) AveGFlop MinGFlop MaxGFlop aveTime(ms) maxTime(ms) minTime(ms) numErrors
10000000 1000000 1000000 133.51 ( 124.77 110.07 124.97 ) ( 12.697 11.201 12.717 ) ( 1.575 1.786 1.573 ) 0 RESULT
Kokkos::MultiVector Test: Passed
That's about a 3x speedup. We used to get 2x, but the tuning options have been expanded, the ideal configuration must be something else
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging WARNING: NO REVIEWERS HAVE BEEN REQUESTED FOR THIS PULL REQUEST!
This is profoundly, incredibly, extremely broken. The old version was right. I'll hack on this more to figure out why. Note that it wasn't broken if no tool was loaded, but... yeah. Needs work
Hello,
I am trying to reproduce the results showed here using CCS after merging with master. Rebasing the code is easy enough (see here: https://github.com/ytopt-team/kokkos-kernels/tree/feature/tuning_v4), but it seems that the test ./perf_test/sparse/sparse_spmv --test kk -l 15000 -s 1000000
does not call the functions from src/sparse/impl/KokkosSparse_spmv_impl.hpp
, but rather everything is happening through the SPMV_functor
in perf_test/sparse/spmv/Kokkos_SPMV.hpp
. My understanding may be wrong as I am not used to navigating Kokkos code, especially if operator overloading is involved at some point. Is there a code path invoking this function:
https://github.com/ytopt-team/kokkos-kernels/blob/d152b85cdd876eeca7616b75f430ccff6c6fd1d2/src/sparse/impl/KokkosSparse_spmv_impl.hpp#L502-L567
Thanks,
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.