chapel icon indicating copy to clipboard operation
chapel copied to clipboard

Add some sort of mechanism to have a non 1:1 mapping between loop iterations and GPU threads

Open stonea opened this issue 1 year ago • 2 comments

So far, given a GPUizable loop like:

foreach i in 0..<N do ...

when this is run there will be a one-to-one mapping between loop iterations and GPU threads.

There may be various reasons for users to not one a one-to-one mapping; for example, the iteration space may exceed the maximum allowed number of threads. Also see: https://github.com/chapel-lang/chapel/issues/22152#issuecomment-1525828257

To change the mapping users could rewrite their loops to use an inner for, say something like:

foreach i_prime in 0..<N by 2 do ...
  for i in i_prime..<min(i_prime+1, N) do ...

But maybe it would nicer if we had some language feature to do that. For example:

foreach i in 0..<N with (config cfg = new LoopContext(threadSize=2)) do ...

stonea avatar Apr 27 '23 16:04 stonea