Thread loop optimizations RAJA launch
This PR is a collaboration space for exploring optimization within RAJA launch and the loop abstraction
Hi @LLNL/raja-core, in collaboration with AMD staff we found a key optimization for nested loops within RAJA::launch. This would be nice to have for the upcoming release, the downside is that it introduces another set of policies similar to the existing RAJA::hip_thread_loop_{x,y,z} policies. One thought is to retire the old policies in favor of these new ones, but for performance tracking are the old ones worth keeping?
I wonder if we can't use the same policies but with an extra template argument to choose between the global and context versions of the variables.
I don't think it's a problem to keep the old policies and have a lot of alternatives for folks to try. We do need to work on documenting policies better and have a comprehensive cookbook of examples that clearly show the differences between policies choices, including usage, performance, and how to choose.
Do we want to make populating the context variables optional in case we find any overhead there?
Any thoughts on blockIdx and gridDim?
Do we want to make populating the context variables optional in case we find any overhead there?
oh for register heavy kernels? that makes sense
Any thoughts on blockIdx and gridDim?
I see pro and cons, pro - for completeness could be handy, con - takes up more registers. Maybe we can do partial specializations of the launch context or something like that. This may be less common use cases though.
Ya, I'm imagining the context and some policies both having a switch. Then in the loop implementation it checks that if the policy uses the switch then the context must have the same switch.
template < bool switch >
struct Policy;
template < bool switch >
struct Context;
template < bool policy_switch, bool context_switch >
void loop(Policy<policy_switch>, Context<context_switch>)
{
static_assert(!policy_switch || (policy_switch && context_switch),
"If policy has switch then context must have switch");
}
@MrBurmark , do you have time to take a look? I think I pushed up the ideas we had yesterday