RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

Generalize GPU indexing to add global indexing

Open MrBurmark opened this issue 3 years ago • 5 comments

Generalize GPU indexing to add GPU global indexing

Add indexing classes that amount to an indexing layer. Use those classes to de-duplicate the implementations of For and Tile statements in kernel and launch.

This could cut down the number of implementations to just one for the direct polices and one for the loop policies. However this will cause some slight changes as there were slight differences in the implementations of the thread and block policies.

  • This PR is a refactoring, feature
  • It does the following:
    • refactors the for and tile implementations for kernel and launch
    • Adds gpu global indexing at the request of me

MrBurmark avatar Sep 22 '22 23:09 MrBurmark

I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.

MrBurmark avatar Sep 22 '22 23:09 MrBurmark

I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.

@MrBurmark do you mean that the perf suite test passes? There are compilation issues related to global index types in CUDA builds here.

rhornung67 avatar Sep 28 '22 15:09 rhornung67

I haven't tried to change cuda yet in this branch so that's why its failing. I have not yet run this in the PerfSuite to look at performance, I wanted to be sure if anyone had ideas about the design that I incorporated them before I worried about that too much.

MrBurmark avatar Sep 28 '22 16:09 MrBurmark

@MrBurmark gotcha. I will take a closer look today and provide feedback. Is there anything in particular that you think needs deeper scrutinizing?

rhornung67 avatar Sep 28 '22 16:09 rhornung67

I'll add comments on some of the things that I think are worth noting/thinking about.

MrBurmark avatar Sep 28 '22 20:09 MrBurmark

Should we get this in for the patch release?

artv3 avatar Nov 17 '22 22:11 artv3

Should we get this in for the patch release?

No. This is bigger than a bugfix.

rhornung67 avatar Nov 17 '22 22:11 rhornung67

This should now work for hip, hopefully the tests all pass.

MrBurmark avatar May 19 '23 20:05 MrBurmark

This should now work for cuda and hip. I'm going to try this out with RAJAPerf.

MrBurmark avatar May 26 '23 17:05 MrBurmark

Will there be companion PR for examples and docs?

Yes coming soon to a PR near you. See #1499

MrBurmark avatar Jun 21 '23 15:06 MrBurmark