Generalize GPU indexing to add GPU global indexing

Add indexing classes that amount to an indexing layer. Use those classes to de-duplicate the implementations of For and Tile statements in kernel and launch.

This could cut down the number of implementations to just one for the direct polices and one for the loop policies. However this will cause some slight changes as there were slight differences in the implementations of the thread and block policies.

This PR is a refactoring, feature
It does the following:
- refactors the for and tile implementations for kernel and launch
- Adds gpu global indexing at the request of me

Sep 22 '22 23:09 MrBurmark

I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.

Sep 22 '22 23:09 MrBurmark

I wanted to put this out here for feedback before I went too far. I'm curious what people think of the design and if anyone (@ajkunen) thought the slight differences between things like thread and block implementations were significant. I plan to do a before and after with the perf suite at some point to ensure performance but it passes the tests.

@MrBurmark do you mean that the perf suite test passes? There are compilation issues related to global index types in CUDA builds here.

Sep 28 '22 15:09 rhornung67

I haven't tried to change cuda yet in this branch so that's why its failing. I have not yet run this in the PerfSuite to look at performance, I wanted to be sure if anyone had ideas about the design that I incorporated them before I worried about that too much.

Sep 28 '22 16:09 MrBurmark

@MrBurmark gotcha. I will take a closer look today and provide feedback. Is there anything in particular that you think needs deeper scrutinizing?

Sep 28 '22 16:09 rhornung67

I'll add comments on some of the things that I think are worth noting/thinking about.

Sep 28 '22 20:09 MrBurmark

Should we get this in for the patch release?

Nov 17 '22 22:11 artv3

Should we get this in for the patch release?

No. This is bigger than a bugfix.

Nov 17 '22 22:11 rhornung67

This should now work for hip, hopefully the tests all pass.

May 19 '23 20:05 MrBurmark

This should now work for cuda and hip. I'm going to try this out with RAJAPerf.

May 26 '23 17:05 MrBurmark

Will there be companion PR for examples and docs?

Yes coming soon to a PR near you. See #1499

Jun 21 '23 15:06 MrBurmark

Generalize GPU indexing to add global indexing

Generalize GPU indexing to add GPU global indexing