Ludovic Räss
Ludovic Räss
Clarify in the documentation that the hide communication feature is currently only active and supported with the CUDA backend, as flagged by [this Discourse post](https://discourse.julialang.org/t/poor-scaling-results-with-implicitglobalgrid-jl/65170/10).
Something to consider as alternative or supplement to the current `Threads.@threads` option. The `@tturbo` macro allows for threaded aux instruction exposed by the [LoopVectorization](https://juliasimd.github.io/LoopVectorization.jl/stable/api/#LoopVectorization.@tturbo) package. See here https://github.com/luraess/parallel-gpu-workshop-JuliaCon21#parallel-cpu-implementation for an...
Handle `MPI.Consts.MPI_PROC_NULL[]` in a similar fashion to `MPI_COMM_TYPE_SHARED` (related / by analogy to #584). In short, avoid exposing the `MPI.Const.XYZ[]` to the user. _Note: feel free to edit this issue...
Testing MPI v0.20-dev (master) branch on Linux (Ubuntu 20.04) with system MPI ``` julia> MPIPreferences.use_system_binary() ┌ Info: MPI implementation │ libmpi = "libmpi" │ version_string = "Open MPI v4.1.2, package:...
I observe a strange behaviour using `MPI.jl#master` (0.20-dev) while running the `MPI.use_system_binary()` in a project outside of the MPI.jl project. Namely, the system MPI implementation seems to be correctly selected...
Testing MPI v0.20-dev (master) branch on Linux server via SLURM using `salloc` and launching Julia as `julia --project -e 'using Pkg; Pkg.test()'` via `srun -n1 ./launch_julia.sh` hangs on https://github.com/JuliaParallel/MPI.jl/blob/9a1dd861213a312fb8c16c28147d90664995ae6f/test/runtests.jl#L14-L18 although...
Add a test for `Mem.unsafe_copy3d!` function to perform device to device copy in async fashion.
This would expose similar API as CUDA.jl's `CUDA.Mem.unsafe_copy3d!`. Suggestion from @jpsamaroo: write a memcopy kernel and put it directly in AMDGPU.jl exposing an argument to specify which queue to use.
Testing the `AMDGPU.Mem.unsafe_copy3d!` function (#220) may hang the GPU in the BuildKite CI. No issue is observed outside of CI. A current workaround is to add an operation (tested `sleep`,...