Generalize GPU indexing to add global indexing
Generalize GPU indexing to add GPU global indexing
Add indexing classes that amount to an indexing layer. Use those classes to de-duplicate the implementations of For and Tile statements in kernel and launch.
This could cut down the number of implementations to just one for the direct polices and one for the loop policies. However this will cause some slight changes as there were slight differences in the implementations of the thread and block policies.
- This PR is a refactoring, feature
- It does the following:
- refactors the for and tile implementations for kernel and launch
- Adds gpu global indexing at the request of me
I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.
I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.
@MrBurmark do you mean that the perf suite test passes? There are compilation issues related to global index types in CUDA builds here.
I haven't tried to change cuda yet in this branch so that's why its failing. I have not yet run this in the PerfSuite to look at performance, I wanted to be sure if anyone had ideas about the design that I incorporated them before I worried about that too much.
@MrBurmark gotcha. I will take a closer look today and provide feedback. Is there anything in particular that you think needs deeper scrutinizing?
I'll add comments on some of the things that I think are worth noting/thinking about.
Should we get this in for the patch release?
Should we get this in for the patch release?
No. This is bigger than a bugfix.
This should now work for hip, hopefully the tests all pass.
This should now work for cuda and hip. I'm going to try this out with RAJAPerf.
Will there be companion PR for examples and docs?
Yes coming soon to a PR near you. See #1499