LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Macro(s) for vectorizing loops.
I'm seeing segmentation faults for operations with `PermutedDimsArray` with `ndims > 8`: ```julia using LoopVectorization function permutedims_lv(A, p) Ap = PermutedDimsArray(A, p) B = similar(Ap) @turbo B .= Ap return...
This didn't appear to help compile time performance in the case I was interested in.
I get the following error (on Julia v1.7.2): ```julia julia> using Pkg; Pkg.activate(temp=true); Pkg.add("LoopVectorization") ... [bdcacae8] + LoopVectorization v0.12.119 ... [aedffcd0] + Static v0.7.4 ... julia> function foo_plain!(dst, src) for...
```julia using LoopVectorization, Static M,K,N = 3,4,5 A = view(fill(NaN,100,100), 11:10+M,11:10+K+1); A .= rand.(); B = view(fill(NaN,100,100), 11:10+N,11:10+K); B .= rand.(); C0 = view(zeros(100,100), 11:10+N, 11:10+M); C1 = deepcopy(C0); function...
``` using Pkg Pkg.add("VectorizationBase") Pkg.add("Hwloc") using VectorizationBase,Hwloc print(VectorizationBase.num_cores()) print(num_physical_cores()) print(num_virtual_cores()) ``` ``` $srun julia --threads 24 debug.jl Updating registry at `~/.julia/registries/General.toml` Resolving package versions... No Changes to `~/.julia/environments/v1.7/Project.toml` No Changes...
Does `LoopVectorization` support complex vectors? E.g. ```julia using LoopVectorization function test_avx!(x::Vector, y::Vector, beta) @avx for n=1:length(x) x[n] += conj(beta)*y[n] end end # set up T = ComplexF32 N = 1024...
LoopVectorization compiled into sysimage like this ``` import PackageCompiler @time PackageCompiler.create_sysimage(["LoopVectorization"], replace_default = true) ``` causes a crash when Octavian is loaded ``` julia> using Octavian Please submit a bug...
I have written a macro that generates a fast loop for interpolations. Without `@tturbo` and with `Threads.@threads` the code runs fine, but it fails with `@tturbo` which does not do...
In some cases, @ turbo is far slower on BitArrays than on anything else, and in other cases check_args fails and the vectorization doesn't happen. I can't replicate the check_args...
Function `f()`, which goes over the data once, is more than ten times slower than function `f2()`, which goes over the data twice. Is this some kind of bug? ```julia...