Tullio.jl
Tullio.jl copied to clipboard
Use LoopVectorization.jl's threads, sometimes?
LoopVectorization has changed two things since its interaction with Tullio was thought out:
- a name change
@avx->@turbo, and - a multi-threading macro
@avxtor@tturbo == @turbo thread=true.
The easy change would be to make the keyword here turbo=true etc.
I believe the threading uses https://github.com/JuliaSIMD/Polyester.jl, and has lower overhead to launch threads than Threads.@spawn. But if I understand right, using both together can cause problems, e.g. https://github.com/JuliaSIMD/LoopVectorization.jl/issues/221 or https://github.com/JuliaSIMD/ThreadingUtilities.jl/issues/25. To allow but not require use of this, the questions are:
- Should this just mean calling
@tturboon the whole iteration space (as is done for KernelAbstractions now) or should it also/only be possible to use these threads within Tullio's recursive threads-then-blocks algorithm? - Is there a non-confusing interface for this? Since
@tullioaims to be concise it's nice not to need 5 keyword options every time.
Tullio's recursive threads-then-blocks algorithm?
An additional consideration is that I haven't implemented anything like this in LoopVectorization yet, so Tullio's current implementation will get better performance beyond a certain size:
(Also, I made LV ramp thread use up more slowly since creating this plot, so I should probably rerun this benchmark to see how it looks now.)
I'll implement this eventually, but it'll be a while.
That's a nice graph. You can see that Tullio turns on threading too early (around 64 IIRC) on your machine -- the overhead of @spawn isn't paying for itself.
OK, so it sounds like the goal is to figure out how to use ThreadingUtilities or Polyester in place of @spawn.
One possible interface is like @tullio A[i] := exp(B[i]) threads=Polyester. There's already grad=Base / Dual / false. And if it's an orthogonal choice to whether to use @turbo then perhaps it shouldn't share a keyword.