ParallelStencil.jl
ParallelStencil.jl copied to clipboard
Include `@tturbo` as loop vectorisation possibility for the CPU backend
Something to consider as alternative or supplement to the current Threads.@threads option. The @tturbo macro allows for threaded aux instruction exposed by the LoopVectorization package. See here https://github.com/luraess/parallel-gpu-workshop-JuliaCon21#parallel-cpu-implementation for an example. There may be some restrictions on handling if conditions inside the loop.
reopened as foreseen GPU optimizations should also make the usage of LoopVectorization feasible without or little approach divergence between CPU and GPU code generation