ParallelStencil.jl icon indicating copy to clipboard operation
ParallelStencil.jl copied to clipboard

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs

Results 36 ParallelStencil.jl issues
Sort by recently updated
recently updated
newest added

Clarify in the documentation that the hide communication feature is currently only active and supported with the CUDA backend, as flagged by [this Discourse post](https://discourse.julialang.org/t/poor-scaling-results-with-implicitglobalgrid-jl/65170/10).

documentation

I couldn't get either methods for adjoint generation working over the `@parallel` stencil. For pure Zygote-based VJP calculations, ```julia @parallel function diffusion3D_step!(T2, T, Ci, lam, dx, dy, dz) @inn(T2) =...

I have several `@parallel` functions that each have several arguments (of type `Data.Array`). To make the code cleaner, I (naively) tried passing a single `struct` containing all of these arguments....

Hey there! Having a great time using this package, but I have started to include ParallelStencils into a package that I am writing, and that involves tagging some functions with...

Not sure if you want to avoid domain specific miniapps, but I am working towards modifying the acoustic 2D miniapp for ultrasound and photoacoustic modeling. Currently I just played around...

miniapps

Something to consider as alternative or supplement to the current `Threads.@threads` option. The `@tturbo` macro allows for threaded aux instruction exposed by the [LoopVectorization](https://juliasimd.github.io/LoopVectorization.jl/stable/api/#LoopVectorization.@tturbo) package. See here https://github.com/luraess/parallel-gpu-workshop-JuliaCon21#parallel-cpu-implementation for an...

enhancement

I'm curious what GPU performance you get against something like the cudnn wrappers of NNlibCUDA.jl. Those would be more appropriate comparisons than CuArray broadcasting.