Chris Elrod

Results 832 comments of Chris Elrod
trafficstars

@MasonProtter, here is how to rewrite the loops to get them to work: ```julia using LoopVectorization, LinearAlgebra function matmul!(u::AbstractVector{T}, A::Tridiagonal{T}, v::AbstractVector{T}) where {T} @assert length(u) == size(A,1) == size(A,2) ==...

> Issue 61 seems to be related. Did I do something dumb here? You didn't do anything dumb. It's the same sort of issue. Some methods weren't defined, causing the...

Thanks for the benchmarks! Notice that `gemv` requires O(N^2) calculations (e.g., `size(A,1)` dot products of length `size(A,2)`). Meaning going from 1_000 x 1_000 to 10_000 x 10_000 should take 100x...

I fixed the error, but there seems to have been a severe performance regression here: ```julia julia> @benchmark gemv_tile!($c1, $A, $b) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 --------------...

The easiest way to improve performance may be to define a custom `BitMatrix`/`BitArray` type that is padded to have its leading axis contain a multiple of 8 elements, so that...

Hmm, are you running with `--check-bounds=no`? I tried your example and got ```julia julia> main(9,2) ERROR: BoundsError: attempt to access 9-element Vector{Bool} at index [1099512283137] Stacktrace: [1] getindex @ ./essentials.jl:13...

Interesting. Of course, it shouldn't be throwing at all. But throwing is a lot nicer than crashing.

This is unfortunately a difficult issue, somewhat central to LoopVectorization's current design. It should be largely addressed in the rewrite/redesign, but I can only work on that as my free...

Yes, that is essentially it. The LoopVectorization rewrite is as an LLVM pass, therefore occurring much lower in the stack. Aside from compilation benefits by occurring at the correct place,...

Thanks, I'll have to update the README. cscherrer posted similar results using his 2950X on Discourse. On the master branch of LoopVectorization (a few things have improved, but dot product...