RAJA icon indicating copy to clipboard operation
RAJA copied to clipboard

Thread loop optimizations RAJA launch

Open artv3 opened this issue 1 month ago • 8 comments

This PR is a collaboration space for exploring optimization within RAJA launch and the loop abstraction

artv3 avatar Nov 25 '25 14:11 artv3

Hi @LLNL/raja-core, in collaboration with AMD staff we found a key optimization for nested loops within RAJA::launch. This would be nice to have for the upcoming release, the downside is that it introduces another set of policies similar to the existing RAJA::hip_thread_loop_{x,y,z} policies. One thought is to retire the old policies in favor of these new ones, but for performance tracking are the old ones worth keeping?

artv3 avatar Dec 02 '25 17:12 artv3

I wonder if we can't use the same policies but with an extra template argument to choose between the global and context versions of the variables.

MrBurmark avatar Dec 02 '25 17:12 MrBurmark

I don't think it's a problem to keep the old policies and have a lot of alternatives for folks to try. We do need to work on documenting policies better and have a comprehensive cookbook of examples that clearly show the differences between policies choices, including usage, performance, and how to choose.

rhornung67 avatar Dec 02 '25 17:12 rhornung67

Do we want to make populating the context variables optional in case we find any overhead there?

MrBurmark avatar Dec 02 '25 17:12 MrBurmark

Any thoughts on blockIdx and gridDim?

MrBurmark avatar Dec 02 '25 17:12 MrBurmark

Do we want to make populating the context variables optional in case we find any overhead there?

oh for register heavy kernels? that makes sense

artv3 avatar Dec 02 '25 19:12 artv3

Any thoughts on blockIdx and gridDim?

I see pro and cons, pro - for completeness could be handy, con - takes up more registers. Maybe we can do partial specializations of the launch context or something like that. This may be less common use cases though.

artv3 avatar Dec 02 '25 19:12 artv3

Ya, I'm imagining the context and some policies both having a switch. Then in the loop implementation it checks that if the policy uses the switch then the context must have the same switch.

template < bool switch >
struct Policy;

template < bool switch >
struct Context;

template < bool policy_switch, bool context_switch >
void loop(Policy<policy_switch>, Context<context_switch>)
{
  static_assert(!policy_switch || (policy_switch  && context_switch),
                "If policy has switch then context must have switch");
}

MrBurmark avatar Dec 02 '25 20:12 MrBurmark

@MrBurmark , do you have time to take a look? I think I pushed up the ideas we had yesterday

artv3 avatar Dec 18 '25 16:12 artv3