Chris Elrod
Chris Elrod
That was just supposed to be an implementation of [_mmX_madd_epi16](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_madd&expand=3511): ``` FOR j := 0 to 15 i := j*32 dst[i+31:i] := a[i+31:i+16]*b[i+31:i+16] + a[i+15:i]*b[i+15:i] ENDFOR dst[MAX:512] := 0 ```...
```julia julia> using VectorizationBase: REGISTER_SIZE julia> @generated function vpmaddwd(a::NTuple{W,Core.VecElement{Int16}}, b::NTuple{W,Core.VecElement{Int16}}) where {W} Wh = W >>> 1 @assert 2Wh == W @assert (REGISTER_SIZE >> 1) ≥ W S = W...
A difficulty is that it isn't obvious what we should be doing with the accumulator. Normally, if the loop if a loop is U-fold unrolled and vectorized with width W,...
A few problems: - [ ] No handling of anonymous functions (ie, `x -> match(x, input[j])`). - [ ] No SIMD implementation of `findfirst`, or `match`, especially for strings and...
I haven't seen [ExprTools](https://github.com/invenia/ExprTools.jl) before, but it may be worth taking a look at more generally. The simplest approach for handling anonymous functions is just moving the definition in front...
The [README](https://github.com/chriselrod/LoopVectorization.jl#dealing-with-structs) also has an example with complex numbers. Mind posting all the code for a reproducible example? Allocations suggests there may be a type instability somewhere. Or perhaps a...
Looking at [those benchmarks](https://github.com/mcabbott/Tullio.jl/blob/coronavirusrewrite/benchmarks/grads01.jl), there are a lot of allocations when using `@avx`, but not otherwise. Could you by any chance try and find out where they're coming from? >...
Yes, that's definitely a problem with `@avx`: ```julia julia> @macroexpand @avx for i = 📏i for j = 📏j ℛℰ𝒮 = (A[i] + 𝜀A) * log((B[i, j] + 𝜀B) /...
Ah, checking out `coronaviruswrite` solves that problem. > Is `getfield` expected to work? Not yet. PR's are welcome, otherwise I may get around to it, but probably not until after...
If you find basic `SVec` definitions missing that sound like they should be there, please file an issue (or pull request) at [SIMDPirates](https://github.com/chriselrod/SIMDPirates.jl) or [SLEEFPirates](https://github.com/chriselrod/SLEEFPirates.jl). SIMDPirates is full of `llvmcall`s,...