Tullio.jl Use LoopVectorization.jl's threads, sometimes?

trafficstars

LoopVectorization has changed two things since its interaction with Tullio was thought out:

a name change @avx -> @turbo, and
a multi-threading macro @avxt or @tturbo == @turbo thread=true.

The easy change would be to make the keyword here turbo=true etc.

I believe the threading uses https://github.com/JuliaSIMD/Polyester.jl, and has lower overhead to launch threads than Threads.@spawn. But if I understand right, using both together can cause problems, e.g. https://github.com/JuliaSIMD/LoopVectorization.jl/issues/221 or https://github.com/JuliaSIMD/ThreadingUtilities.jl/issues/25. To allow but not require use of this, the questions are:

Should this just mean calling @tturbo on the whole iteration space (as is done for KernelAbstractions now) or should it also/only be possible to use these threads within Tullio's recursive threads-then-blocks algorithm?
Is there a non-confusing interface for this? Since @tullio aims to be concise it's nice not to need 5 keyword options every time.

Jun 30 '21 16:06 mcabbott

Tullio's recursive threads-then-blocks algorithm?

An additional consideration is that I haven't implemented anything like this in LoopVectorization yet, so Tullio's current implementation will get better performance beyond a certain size: (Also, I made LV ramp thread use up more slowly since creating this plot, so I should probably rerun this benchmark to see how it looks now.) I'll implement this eventually, but it'll be a while.

Jun 30 '21 16:06 chriselrod

That's a nice graph. You can see that Tullio turns on threading too early (around 64 IIRC) on your machine -- the overhead of @spawn isn't paying for itself.

OK, so it sounds like the goal is to figure out how to use ThreadingUtilities or Polyester in place of @spawn.

One possible interface is like @tullio A[i] := exp(B[i]) threads=Polyester. There's already grad=Base / Dual / false. And if it's an orthogonal choice to whether to use @turbo then perhaps it shouldn't share a keyword.

Jun 30 '21 17:06 mcabbott

Tullio.jl Tullio.jl copied to clipboard

Use LoopVectorization.jl's threads, sometimes?

Tullio.jl
Tullio.jl copied to clipboard