RAJAPerf icon indicating copy to clipboard operation
RAJAPerf copied to clipboard

Feature/kokkos

Open ajpowelsnl opened this issue 4 years ago • 9 comments

Summary

@davidbeckingsale and @rhornung67 have been our main points of contact and collaboration in this work

The proposed code implements RAJAPerf Suite (RPS) - based performance testing for Kokkos.

Main new code and changes include:

  • Bringing Kokkos in as a RPS (tpl) submodule
  • Kokkos implementations (with Kokkos_Lambda as a kernel variant) of the basic, stream, lcals , apps and algorithm kernel groups (polybench will be completed in FY22)
  • Changes to the build system / cmake
  • Significant changes to the infrastructure for registering new kernel groups and new kernels:

common/Executor.hpp: kernelID registerKernel(std::string, KernelBase*); common/Executor.cpp: exec->registerKernel(groupName, kernel); RAJAPerfSuiteDriver.cpp: //executor.registerKernel

Configure Kokkos build and runtime in RPS:

cmake
-DENABLE_KOKKOS=ON
-DENABLE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug .. \

Please do not hesitate to contact @ajpowelsnl or @DavidPoliakoff for any questions and / or further discussion.

ajpowelsnl avatar Sep 16 '21 18:09 ajpowelsnl

@ajpowelsnl did you branch from an older version of the persuite. I was interested in looking at the kokkos version of the MAT_MAT_SHARED kernel, but don't see it.

artv3 avatar Sep 17 '21 16:09 artv3

@ajpowelsnl did you branch from an older version of the persuite. I was interested in looking at the kokkos version of the MAT_MAT_SHARED kernel, but don't see it.

Hi @artv3 -- Thanks for the question. I started working on the project late last year (first commit was Nov. 16, 2020), so it probably is an "older" version of the Perf Suite. I did not work on anything called "MAT_MAT_SHARED." Instead, I provided Kokkos translations of RAJAPerf Suite kernels (that existed around that date). Does that make sense?

ajpowelsnl avatar Oct 26 '21 19:10 ajpowelsnl

@davidbeckingsale , @DavidPoliakoff -- I'm also removing or suppressing this debugging output in this commit:

Running specified kernels and variants... Value is 3.14159 Value is 3.14159 Value is 3.14159 Value is 3.14159 Value is 3.14159 Value is 3.14159 Value is 3.14159 FIX ME STREAM DOT -- GET DATA FROM VIEWS

ajpowelsnl avatar Oct 26 '21 20:10 ajpowelsnl

@ajpowelsnl did you branch from an older version of the persuite. I was interested in looking at the kokkos version of the MAT_MAT_SHARED kernel, but don't see it.

Hi @artv3 -- Thanks for the question. I started working on the project late last year (first commit was Nov. 16, 2020), so it probably is an "older" version of the Perf Suite. I did not work on anything called "MAT_MAT_SHARED." Instead, I provided Kokkos translations of RAJAPerf Suite kernels (that existed around that date). Does that make sense?

Yes, do you plan to add Kokkos kernels for newer kernels in RAJAPerf?

artv3 avatar Oct 27 '21 17:10 artv3

@ajpowelsnl did you branch from an older version of the persuite. I was interested in looking at the kokkos version of the MAT_MAT_SHARED kernel, but don't see it.

Hi @artv3 -- Thanks for the question. I started working on the project late last year (first commit was Nov. 16, 2020), so it probably is an "older" version of the Perf Suite. I did not work on anything called "MAT_MAT_SHARED." Instead, I provided Kokkos translations of RAJAPerf Suite kernels (that existed around that date). Does that make sense?

Yes, do you plan to add Kokkos kernels for newer kernels in RAJAPerf?

Morning @artv3 -- Thanks again for your question, and your interest. @DavidPoliakoff, @davidbeckingsale and I have briefly discussed providing Kokkos translations for the new kernels , but that area of work has not been discussed yet with our project leads / folks who control the purse strings. Do you have a particular interest in Kokkos versions of the new kernels? If so, then it might help me strengthen the argument for doing the Kokkos translations of the new kernels. Does it benefit both groups to understand performance behavior over time?

ajpowelsnl avatar Oct 27 '21 17:10 ajpowelsnl

@ajpowelsnl did you branch from an older version of the persuite. I was interested in looking at the kokkos version of the MAT_MAT_SHARED kernel, but don't see it.

Hi @artv3 -- Thanks for the question. I started working on the project late last year (first commit was Nov. 16, 2020), so it probably is an "older" version of the Perf Suite. I did not work on anything called "MAT_MAT_SHARED." Instead, I provided Kokkos translations of RAJAPerf Suite kernels (that existed around that date). Does that make sense?

Yes, do you plan to add Kokkos kernels for newer kernels in RAJAPerf?

Morning @artv3 -- Thanks again for your question, and your interest. @DavidPoliakoff, @davidbeckingsale and I have briefly discussed providing Kokkos translations for the new kernels , but that area of work has not been discussed yet with our project leads / folks who control the purse strings. Do you have a particular interest in Kokkos versions of the new kernels? If so, then it might help me strengthen the argument for doing the Kokkos translations of the new kernels. Does it benefit both groups to understand performance behavior over time?

Yes absolutely! Those kernels express hierarchical parallelism and for our applications we have found that by taking advantage of the memory hierachy on the GPU (using shared memory [CUDA/HIP] or device local memory [SYCL]) we can really improve kernel performance. Recognizing that I always wonder how do you expose these device feature features in a portable and friendly way across programing models and of course minimize abstraction layerover head. As new programming models come online how do we maintain these features?

artv3 avatar Oct 27 '21 18:10 artv3

Hey guys, I didn't yet read all the comments but I think we should try and split the infrastructure changes from the Kokkos kernel variant changes.

So essentially something like this: PR1: Infrastructure changes (how to add Kernel groups etc. + let perf suite serve as a perf testing infrastructure in other projects [i.e. Kokkos Core/Kokkos Kernels]) PR2: Add optional Kokkos dependency with all the kernels variants.

@davidbeckingsale what do you think about this?

Note that PR2 are not really dependent on PR1 so we could also do those two first and then tackle the infrastructure change. We could also split PR2 into doing kernel groups individually I guess?

crtrott avatar Jan 11 '22 23:01 crtrott

@davidbeckingsale what do you think about this?

I agree that splitting the changes up into two distinct pieces would be best, and I suggested that approach before. I think the two main sticking points on this PR right now are 1) ensuring that the infrastructure changes maintain all the current capability, and 2) deciding how to handle cases where a variant is "missing".

davidbeckingsale avatar Jan 12 '22 18:01 davidbeckingsale

@davidbeckingsale what do you think about this?

I agree that splitting the changes up into two distinct pieces would be best, and I suggested that approach before. I think the two main sticking points on this PR right now are 1) ensuring that the infrastructure changes maintain all the current capability, and 2) deciding how to handle cases where a variant is "missing".

Hi David & Christian -- Many thanks for responding. As for sticking point 2, in the most recent push, I did stub in Kokkos variants for the polybench kernels, somewhat alleviating that issue. But we are still left with sticking point 1. I will work with Christian on this, and see if we can come up with a satisfactory solution that preserves current infrastructure capability.

ajpowelsnl avatar Jan 12 '22 18:01 ajpowelsnl