Chris Elrod
Chris Elrod
Everything default, not even an `-O3`: ```julia > ../julia-1.6.1/bin/julia --project=~/Documents/progwork/julia/env/diffeqold/ (base) _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ |...
@shipengcheng1230 Could you profile Julia 1.6 and 1.5, and find out where 1.6 is taking a lot more time for you? As none of us can reproduce your problem, there's...
> You may have to `@nospecialize` the function you're wrapping _and_ all of its callees (or at least those without predictable or precompiled types). The point of function wrappers is...
```julia julia> using SnoopCompile julia> using OrdinaryDiffEq julia> function f(du, u, p, t) du[1] = 0.2u[1] du[2] = 0.4u[2] end f (generic function with 1 method) julia> u0 = ones(2)...
I tried this (requires LoopVectorization 0.6.24 or newer): ```julia using LoopVectorization function gemm_accurate_kernel!(C, A, B) @avx for n in 1:size(C,2), m in 1:size(C,1) Cmn_hi = zero(eltype(C)) Cmn_lo = zero(eltype(C)) for...
I'm not sure what an EFT is, but from context I assume it means the individual operations in the functions you described. In which case I agree with ffevotte, in...
Whats the path forward on unityping, either through your branch or Unityper? This is of course a short term improvement at best.
This problem actually hits 5 current limitations in `LoopVectorization`. Which is quite impressive for such simple looking loops. First of all, ```julia julia> typeof.((st, indices, SU, y, subspace)) (Matrix{ComplexF64}, Vector{Int64},...
The new `reinterpret(reshape, ...)` is great for this: ```julia julia> A = rand(ComplexF64, 3, 4) 3×4 Matrix{ComplexF64}: 0.255597+0.833728im 0.538941+0.632544im 0.689762+0.322061im 0.0550905+0.186901im 0.392467+0.609713im 0.408867+0.111692im 0.574269+0.602653im 0.760552+0.385366im 0.221208+0.544642im 0.928497+0.13649im 0.782709+0.872457im 0.460958+0.809076im julia>...
> In my benchmark, it seems Julia fails to allocate the tempries on stack, I tried to remove the subspace and comspace and use a constant indices like the previous...