Chris Elrod
Chris Elrod
> and somehow even I moved the temporary array C out and make every dim static, this function still allocates... ```julia julia> @benchmark subspace_mul_generic!($(copy(S)), indices, C, U, subspace) BenchmarkTools.Trial: memory...
@Roger-luo If you have the FMA instruction set, this should be fast: ```julia julia> @time using StaticArrays @time using Stride 2.432350 seconds (3.14 M allocations: 218.145 MiB, 2.46% gc time)...
[Zulip discussion/explanation here](https://julialang.zulipchat.com/#narrow/stream/137791-general/topic/Working.20with.20Complex.20Numbers.20in.20LoopVectorization/near/247284302). AVX512 benchmarks: ```julia julia> @benchmark $(Ref(As4x4))[] * $(Ref(Bs4x4))[] BechmarkTools.Trial: 10000 samples with 989 evaluations. Range (min … max): 49.047 ns … 92.030 ns ┊ GC (min …...
Note that this implementation is however much slower on CPUs that don't have the FMA instruction set (i.e., specifically the FMA instruction set, as that provides the `vfmaddsub` instruction). I'll...
> thanks! this is amazing! is there a way to detect FMA instruction set as a global option during precompile then? I think I could also just put this to...
> I use Apple M1. I installed the macOS Julia, I assume that is Intel based? Somehow it works for me. Yes. You can build Julia 1.7 and 1.8 from...
On the Apple M1, this is what I get, Rosetta: ```julia julia> @benchmark jgemvavx!($y, $A, $x) BenchmarkTools.Trial: 10000 samples with 199 evaluations. Range (min … max): 414.990 ns … 465.241...
> Thanks @chriselrod, that explains everything regarding the current status of the tooling. I realize the the M1 might not be ideal yet for Julia. I really like it for...
> In Julia, the last index is continuous? The first index, like Fortran. > The issue is that I can't tell from the plot if the performance is already the...
> I use the M1 MacBook Pro. The fan never turns on. The 8 core M1 laptop is about as fast for C++ compilation as my beefy 48 core Intel...