LoopVectorization.jl icon indicating copy to clipboard operation
LoopVectorization.jl copied to clipboard

Macro(s) for vectorizing loops.

Results 126 LoopVectorization.jl issues
Sort by recently updated
recently updated
newest added

I'm seeing segmentation faults for operations with `PermutedDimsArray` with `ndims > 8`: ```julia using LoopVectorization function permutedims_lv(A, p) Ap = PermutedDimsArray(A, p) B = similar(Ap) @turbo B .= Ap return...

This didn't appear to help compile time performance in the case I was interested in.

I get the following error (on Julia v1.7.2): ```julia julia> using Pkg; Pkg.activate(temp=true); Pkg.add("LoopVectorization") ... [bdcacae8] + LoopVectorization v0.12.119 ... [aedffcd0] + Static v0.7.4 ... julia> function foo_plain!(dst, src) for...

```julia using LoopVectorization, Static M,K,N = 3,4,5 A = view(fill(NaN,100,100), 11:10+M,11:10+K+1); A .= rand.(); B = view(fill(NaN,100,100), 11:10+N,11:10+K); B .= rand.(); C0 = view(zeros(100,100), 11:10+N, 11:10+M); C1 = deepcopy(C0); function...

``` using Pkg Pkg.add("VectorizationBase") Pkg.add("Hwloc") using VectorizationBase,Hwloc print(VectorizationBase.num_cores()) print(num_physical_cores()) print(num_virtual_cores()) ``` ``` $srun julia --threads 24 debug.jl Updating registry at `~/.julia/registries/General.toml` Resolving package versions... No Changes to `~/.julia/environments/v1.7/Project.toml` No Changes...

Does `LoopVectorization` support complex vectors? E.g. ```julia using LoopVectorization function test_avx!(x::Vector, y::Vector, beta) @avx for n=1:length(x) x[n] += conj(beta)*y[n] end end # set up T = ComplexF32 N = 1024...

enhancement

LoopVectorization compiled into sysimage like this ``` import PackageCompiler @time PackageCompiler.create_sysimage(["LoopVectorization"], replace_default = true) ``` causes a crash when Octavian is loaded ``` julia> using Octavian Please submit a bug...

I have written a macro that generates a fast loop for interpolations. Without `@tturbo` and with `Threads.@threads` the code runs fine, but it fails with `@tturbo` which does not do...

In some cases, @ turbo is far slower on BitArrays than on anything else, and in other cases check_args fails and the vectorization doesn't happen. I can't replicate the check_args...

Function `f()`, which goes over the data once, is more than ten times slower than function `f2()`, which goes over the data twice. Is this some kind of bug? ```julia...