Chris Elrod

Results 832 comments of Chris Elrod
trafficstars

> I think (5.) is "Broadcasting an Array A when size(A,1) == 1". That was `4.` above. In that case, what's 4? Maybe module-specific toggles, but I'm not a fan...

I haven't had the time to look at this. The free time I've been spending on loop optimizaiton and SIMD has been dedicated to rewriting LoopVectorization, so my plan for...

To copy what I said on discourse: Regarding performance of different architectures, [here](https://uops.info/html-instr/VGATHERQPD_YMM_VSIB_YMM_YMM.html) is a table giving the performance of 256-bit gathers using 4 x `Int64` indices and loading 4...

I'm reopening this. We can close it once LV gets good performance across the architectures.

Regarding the `vrem_fast`, that's a VectorizationBase bug. ```julia julia> vxi = Vec(ntuple(Int,VectorizationBase.pick_vector_width(Int))...) Vec{8, Int64} julia> vxi2 = Vec(ntuple(_ -> rand(Int),VectorizationBase.pick_vector_width(Int))...) Vec{8, Int64} julia> vxi2 % vxi Vec{8, Int64} julia> @fastmath...

However, I don't think LoopVectorization will do the right thing here, and it may be that `gcc` is more clever. ```julia function energy(spin_conf) (Nx, Ny) = size(spin_conf) res = 0...

This library uses a lot of `llvmcall` under the hood, therefore it needs the types it uses to map to LLVM. For some reason Julia doesn't map `NTuple{N,Core.VecElement{Int128}}` to LLVM...

Mind giving me an example I can copy/paste to run?

If all we want to do is add code that doesn't otherwise mix with LoopVectorization, I think it should go into it's own standalone library. Currently, on some architecture &...

This package uses `SLEEFPirates`, which includes a mix of functions ported from SLEEF 2, llvmcall of compiling SLEEF 3 with the option to emit llvm bitcode, and sometimes GLIBC. More...