Chris Elrod
Chris Elrod
> or a term which is sufficiently difficult to compute so that it's not too memory bound, `u[i,j,1]^2` Not sure if I follow here, but `u[i,j,1]` is easy to compute....
Ha, let me do that. Above it was just `matmul!`. Note that the image filtering example doesn't do boundaries... > I mean, when you're looping over u[i,j,2] and need to...
@ChrisRackauckas here is the example I think you were looking for: ```julia using StaticArrays, LoopVectorization, Static const LAPLACE_KERNEL = @MMatrix([0.0 1.0 0.0; 1.0 -4.0 1.0; 0.0 1.0 0.0]); function filter2davx!(out::AbstractMatrix,...
Sure, but at that point why not go the rest of the way to `turbo_gm_fuse!`?
The one thing fusing inhibits is easily threading across the third axis. I tried that, and it didn't help. 8.2ms solve time. I should add a version of `@spawn` to...
Thanks, that's interesting. So the key to a stencil compiler is how it optimizes cache locality across time. That is, it cuts up the iteration space/dependencies in a way so...
Solve times, `prob` is `@turbo`, `prob_threads` is `@tturbo`, and `prob_simdivdep` is `@inbounds @fastmath @simd ivdep`: ```julia julia> @time solve(prob, ROCK4(), reltol = 1e-8, abstol=1e-8, saveat=0.1); 46.533658 seconds (14.03 k allocations:...
Oh, and benchmarks on the M1 (running on a native Julia 1.8 build): ```julia julia> @time solve(prob, ROCK4(), reltol = 1e-8, abstol=1e-8, saveat=0.1); 21.625047 seconds (14.09 k allocations: 35.655 GiB,...
Windows quitting silently is unexpected. That not yielding great performance is expected, and should probably be documented.
> The `thread` argument in e.g. `lu` does not propagate to `schur_complement!` (which uses `@tturbo`), so the execution is still partially threaded when `thread=Val(false)`. Is that intended? No, I missed...