Tullio.jl icon indicating copy to clipboard operation
Tullio.jl copied to clipboard

Use LoopVectorization.jl's threads, sometimes?

Open mcabbott opened this issue 4 years ago • 2 comments

LoopVectorization has changed two things since its interaction with Tullio was thought out:

  1. a name change @avx -> @turbo, and
  2. a multi-threading macro @avxt or @tturbo == @turbo thread=true.

The easy change would be to make the keyword here turbo=true etc.

I believe the threading uses https://github.com/JuliaSIMD/Polyester.jl, and has lower overhead to launch threads than Threads.@spawn. But if I understand right, using both together can cause problems, e.g. https://github.com/JuliaSIMD/LoopVectorization.jl/issues/221 or https://github.com/JuliaSIMD/ThreadingUtilities.jl/issues/25. To allow but not require use of this, the questions are:

  • Should this just mean calling @tturbo on the whole iteration space (as is done for KernelAbstractions now) or should it also/only be possible to use these threads within Tullio's recursive threads-then-blocks algorithm?
  • Is there a non-confusing interface for this? Since @tullio aims to be concise it's nice not to need 5 keyword options every time.

mcabbott avatar Jun 30 '21 16:06 mcabbott

Tullio's recursive threads-then-blocks algorithm?

An additional consideration is that I haven't implemented anything like this in LoopVectorization yet, so Tullio's current implementation will get better performance beyond a certain size: img (Also, I made LV ramp thread use up more slowly since creating this plot, so I should probably rerun this benchmark to see how it looks now.) I'll implement this eventually, but it'll be a while.

chriselrod avatar Jun 30 '21 16:06 chriselrod

That's a nice graph. You can see that Tullio turns on threading too early (around 64 IIRC) on your machine -- the overhead of @spawn isn't paying for itself.

OK, so it sounds like the goal is to figure out how to use ThreadingUtilities or Polyester in place of @spawn.

One possible interface is like @tullio A[i] := exp(B[i]) threads=Polyester. There's already grad=Base / Dual / false. And if it's an orthogonal choice to whether to use @turbo then perhaps it shouldn't share a keyword.

mcabbott avatar Jun 30 '21 17:06 mcabbott