LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Boil down vmap! variants, extend to reduce & mapreduce
There's a whole menagerie of vmap-related functions exported, corresponding to mutating/non-mutating, temporal/non-temporal stores, and singlethreaded/multithreaded. Eight versions of the same function are maintainable, but if some other boolean switch is added, that's sixteen, which could get unwieldy. Wrapper types might allow for a more Julian interface - something like
map!(Threaded(+), Nontemporal(out), a, b)
to replace
vmapntt!(+, out, a, b)
The introduction of Threaded(f) could be extended to reduce & mapreduce for a simple multithreaded reduction interface.
It should probably still be vmap!, as someone may want a non-threaded temporal version?
Otherwise, sounds good to me.
On second thought, using Threaded(f) would prevent the use of do-block syntax. Maybe it'd be better to move this under the @avx umbrella as a special case? e.g.
@avx threads=true map!(C, A, B) do (a, b)
(a + b) / (a - b)
end
Related issue:
https://github.com/chriselrod/LoopVectorization.jl/issues/102
I think that's a reasonable long-term plan: have @avx read functions like map, mapreduce, dot, etc as loops, and let them mix with other such expressions.
Currently, @avx doesn't do temporal stores or threading itself, but these (especially threading) would be good additions.
Currently, the nice thing about the vmap code is that it is very simple.
It doesn't mix with the rest of the library, so map, mapreduce, and vfilter code may all be better off in another library.
A short-term change could be to just use keyword arguments.