David--Cléris Timothée comments

Results 26 comments of


                                            David--Cléris Timothée

Memory leak with CUDA-aware OpenMPI without UCX

It seems indeed related to #12849. Basically pointer used for communication between two GPU gets registered for IPC, and the IPC handle is never released which prevents the memory from...

Memory leak with CUDA-aware OpenMPI without UCX

Any update on this issue ? This does affect production quite significantly ....

[Cuda][Rocm?] memory leak in Ipc cache handling

I agree that shouldn’t be an issue in principle, however when I check with nvidia smi the actual memory usage is growing by similar amount to the active cuIpc handles,...

[Cuda][Rocm?] memory leak in Ipc cache handling

I managed to get somewhat of a reproducer (in sycl though but it is transparent to cuda). Here is the end of the output, clearly the programm memory is unchanged,...

[Cuda][Rocm?] memory leak in Ipc cache handling

Any update on this issue ? This does affect production quite significantly ....

[Algorithms] Data race in In-Place Exclusive Scan Algorithm

have you changed also the read and write part of the scan ? The issue was mostly that the scan does not read and write at the same slot for...

Crash when calling too many MPI_Probe ?

I just tried with ```bash #!/bin/bash -l #PBS -A Shamrock #PBS -N scale_256_hybrid #PBS -l walltime=0:15:00 #PBS -l select=256 #PBS -l place=scatter #PBS -l filesystems=home:flare #PBS -q prod #PBS -k...

Crash when calling too many MPI_Probe ?

Hi, I haven't been able to continue on this those last months. I will try to make a reproducer to ease the tracking

Memory Leak using Cuda-Aware MPI_Send and MPI_Recv for large packets of data

Hi, I thinks i'm encountering this exact issue currently on a workstation. Basically using MPI communications on CUDA allocated memory result in memory leaks. What is the current status of...

Memory Leak using Cuda-Aware MPI_Send and MPI_Recv for large packets of data

> .... > I'm able to reproduce the issue. cuda-ipc transport in UCX caches peer mappings and a free call of peer mapped memory is not guaranteed to release memory....