Chris Elrod

Results 816 comments of Chris Elrod

> Long story short, `->` is one of the most miserable operators to type out. This probably varies a lot by keyboard layout, but perhaps you could configure yours? Or...

Just built it successfully after applying [this patch](https://groups.google.com/d/msg/isl-development/W4hwfdkJylk/FWB-mYGJBAAJ). I assume Sven will add the patch upstream.

> IIRC it's just: `w = [0 1 0; 1 -4 1; 0 1 0]`, and then `conv(A,w)` would be the 2D laplacian. Then the 3D is just using a...

> Then the 3D is just using a 3D weight vector. Do you mean something other than? ```julia julia> [0;0;0;;0;1;0;;0;0;0;;;0;1;0;;1;-6;1;;0;1;0;;;0;0;0;;0;1;0;;0;0;0] # Julia 1.7 syntax 3×3×3 Array{Int64, 3}: [:, :, 1]...

Here are a couple benchmarks. First, 3d laplace on 128x128x128x1x1 -> 126x126x126x1x1: ```julia using LoopVectorization, CUDA, NNlib, NNlibCUDA, BenchmarkTools function laplace_sparse!( out::AbstractArray{ @benchmark CUDA.@sync NNlib.conv!($cuout3d, $cuimg3d, $culaplace_kern, $dcdlaplace) BechmarkTools.Trial: 10000...

```julia using ParallelStencil, ParallelStencil.FiniteDifferences3D @init_parallel_stencil(Threads, Float64, 3); @parallel function diffusion3D_step!(T2, T) @inn(T2) = @d2_xi(T) + @d2_yi(T) + @d2_zi(T) return end out3d_pad = similar(img3d_squeezed); @time @parallel diffusion3D_step!(out3d_pad, img3d_squeezed) @time laplace_sparse!(out3d, img3d_squeezed);...

On the T4 GPU: ```julia using ParallelStencil, ParallelStencil.FiniteDifferences3D @init_parallel_stencil(CUDA, Float64, 3); cuout3d_pad = CuArray(out3d_pad); cuimg3d = CuArray(img3d_squeezed); @time @parallel diffusion3D_step!(cuout3d_pad, cuimg3d) out3d ≈ Array(view(cuout3d_pad, subaxes...)) @benchmark @parallel diffusion3D_step!($cuout3d_pad, $cuimg3d) ```...

Having it optionally apply `@inbounds` would probably help on the CPU. But the branches may prevent SIMD anyway: ```julia if var"##ix, ParallelStencil#264"

> Thanks for reporting - interesting results. Looks like you did the ParallelStencil tests using `Float64` while you have `Float32` arrays for all other tests ? Oops. Wasn't paying attention...

I'm looking into why it's slower at the moment: ```julia julia> @benchmark diffusion3D_step3!($B,$A) BechmarkTools.Trial: 10000 samples with 1 evaluations. Range (min … max): 46.454 μs … 227.213 μs ┊ GC...