LoopVectorization.jl
LoopVectorization.jl copied to clipboard
Cannot vectorize a loop with inner hidden loop (findfirst)
I have a simple algorithm that I want to vectorize. Works with @simd but for @avx throws an error that x is not defined.
function check_simd(regex_list, input)
input_len=length(input)
icon_indices = Vector{Int64}(undef, input_len)
@avx for j=1:input_len
icon_indices[j] = findfirst(x-> match(x, input[j]) !== nothing, regex_list)
end
return icon_indices
end
MWE here: https://gist.github.com/aminya/4b6e6db1147e63727cffc781633a8e0b
A few problems:
- [ ] No handling of anonymous functions (ie,
x -> match(x, input[j])). - [ ] No SIMD implementation of
findfirst, ormatch, especially for strings and regexes. - [ ] No SIMD string support.
I haven't seen ExprTools before, but it may be worth taking a look at more generally.
The simplest approach for handling anonymous functions is just moving the definition in front of the loops, and pretending it is any old opaque function. This will prevent many optimizations like unrolling and blocking, because opaque functions are assumed to be expensive enough so that it isn't worth it. Note that functions like log and sin are not opaque, but are already considered expensive enough to prevent these optimizations, making this sensible.
Alternatively, I could try and estimate their cost.
Currently, LoopVectorization does not recognize the loop-y nature of findfirst at all.
A great idea for an improvement would be recognizing these functions (like map and friends), parsing them as another nested loop. If I do this, it'll be worth interpreting anonymous functions as the body of that inner loop.
To keep you off the edge of your seat, I want to point out that SIMD implementations of match and SIMD string and regex support are unlikely to come any time soon.
I get the same problem with the following function
function rpca_ga_1(Xnorms,U::AbstractMatrix{T},w; tol=1e-7, iters=1000, verbose=false) where {T}
d,N = size(U)
q = randn(d)
q ./= norm(q)
qold = copy(q)
for i = 1:iters
@avx for n = eachindex(Xnorms)
w[n] = sign(U[:,n]'q)*Xnorms[n]
end
μᵢ = μ!(q,w,U)
q .= μᵢ./norm(μᵢ)
dq = sqrt(sum(((x,y),)->abs2(x-y), zip(q,qold)))
verbose && @info "Change at iteration $i: $dq"
if dq < tol
verbose && @info "Converged after $i iterations"
break
end
qold .= q
i == iters && @warn "Reached maximum number of iterations"
end
q
end