David--Cléris Timothée
David--Cléris Timothée
It seems indeed related to #12849. Basically pointer used for communication between two GPU gets registered for IPC, and the IPC handle is never released which prevents the memory from...
Any update on this issue ? This does affect production quite significantly ....
I agree that shouldn’t be an issue in principle, however when I check with nvidia smi the actual memory usage is growing by similar amount to the active cuIpc handles,...
I managed to get somewhat of a reproducer (in sycl though but it is transparent to cuda). Here is the end of the output, clearly the programm memory is unchanged,...
Any update on this issue ? This does affect production quite significantly ....
have you changed also the read and write part of the scan ? The issue was mostly that the scan does not read and write at the same slot for...
I just tried with ```bash #!/bin/bash -l #PBS -A Shamrock #PBS -N scale_256_hybrid #PBS -l walltime=0:15:00 #PBS -l select=256 #PBS -l place=scatter #PBS -l filesystems=home:flare #PBS -q prod #PBS -k...
Hi, I haven't been able to continue on this those last months. I will try to make a reproducer to ease the tracking
Hi, I thinks i'm encountering this exact issue currently on a workstation. Basically using MPI communications on CUDA allocated memory result in memory leaks. What is the current status of...
> .... > I'm able to reproduce the issue. cuda-ipc transport in UCX caches peer mappings and a free call of peer mapped memory is not guaranteed to release memory....