Chris Elrod
Chris Elrod
LinuxPerf comparison: ```julia julia> using LinuxPerf julia> foreachf(f::F, N::Int, arg1::A, args::Vararg{Any,K}) where {F,A,K} = foreach(_ -> f(arg1, args...), 1:N) foreachf (generic function with 1 method) julia> @pstats "(cpu-cycles,task-clock),(instructions,branch-instructions,branch-misses), (L1-dcache-load-misses, L1-dcache-loads,...
Just benchmarked on Haswell (AVX2, FMA, but no AVX512; relatively slow gather & no scatter), and I see the same basic pattern as on Tiger Lake.
> On the agenda is to look into HostCPUFeatures and see if I can get it from there. Names could be a bit more informative, but -- * HostCPUFeatures.jl is...
That list is (aside from LoopVectorization itself): https://github.com/SciML/ArrayInterface.jl https://github.com/chriselrod/VectorizationBase.jl https://github.com/chriselrod/SLEEFPirates.jl
> It makes me think that the issue is machine/OS related. I'm assuming it only shows up on systems with AVX512.
```julia julia> dx1 .- dx2 |> extrema (-4.304807428601304, 7.271464924157296) ``` You should be able to replicate results like that with `@avx unroll=(1,4) for...`
> Awesome, would be curious to see how you did it. But no rush, obviously. By making it a lot more conservative with respect to non-aliasing (e.g., stop it from...
Currently, the `@avx` macro won't work either. `vmap` and `@avx` call `VectorizationBase.vectorizable` and `VectorizationBase.stridedpointer` on each of the arrays, respectively. These return structs holding the pointer (and in the case...
It is worth watching [ArrayInterface.jl](https://github.com/JuliaDiffEq/ArrayInterface.jl). That may be very useful for supporting StaticVectors, as well as other array types like struct of arrays and arrays of structs.
VegaLite sorts all the categories alphabetically, and then uses the color scale with respect to that alphabetical ordering. As a workaround, I renamed them "01. LoopVectorization", "02. Julia", etc, so...