Concepts for high-dimensional loops for RAJA
Concept 1:
Consider support high-dimensional thread teams in RAJA by having a RAJA launch context store a ThreadBlockLayout.
ThreadBlockLayout which will store number of threads in a high dimensional block.
We can potentially also consider a high-dimensional compute grid, perhaps call it GridLayout.
Have specializations for 1,2,3 index.
Concept 2: Consider a generalization of a memory arena for GPU shared memory, it could be initialized with static or dynamic memory.
Concept 3: A concept in which you can switch loop ordering based on outer first vs inner first. The main application is perfectly nested loops. We can consider perfectly nested cuda/hip thread loops. Idea here is to use a policy to switch which loops are ordering vs which are done per thread.
Concept 4: Orientations for Views, i.e. -1 will get us the last index.