chapel
chapel copied to clipboard
Add some sort of mechanism to have a non 1:1 mapping between loop iterations and GPU threads
So far, given a GPUizable loop like:
foreach i in 0..<N do ...
when this is run there will be a one-to-one mapping between loop iterations and GPU threads.
There may be various reasons for users to not one a one-to-one mapping; for example, the iteration space may exceed the maximum allowed number of threads. Also see: https://github.com/chapel-lang/chapel/issues/22152#issuecomment-1525828257
To change the mapping users could rewrite their loops to use an inner for
, say something like:
foreach i_prime in 0..<N by 2 do ...
for i in i_prime..<min(i_prime+1, N) do ...
But maybe it would nicer if we had some language feature to do that. For example:
foreach i in 0..<N with (config cfg = new LoopContext(threadSize=2)) do ...