ucx
ucx copied to clipboard
UCM/CUDA/TEST: Install memory hooks for async Cuda allocations
Why
As discussed in #7194 and #7110 , need to add memory hooks support for cuda async allocations. Without this, applications using these allocations may fail to detect Cuda memory and run into segfault/access error.
@Akshay-Venkatesh WDYT?
/azp run
Azure Pipelines successfully started running 2 pipeline(s).
@yosefe forgot to bring up the issue of lack of sync memops support on MallocAsync memory that may come up because of this PR. Adding this PR would likely result in IB or cuda-ipc UCTs to be used to move memory allocated through MallocAsync but the following sequence could lead to stale data being transferred:
cudaMallocAsync(&x, length1, stream1);
cudaStreamSynchromize(stream1);
...
cudaMemcpy(x, y, length2, cudaMemcpyHostToDevice); // potentially non-blocking wrt CPU and copy to destination x may still be in flight
ucp_tag_send_nbx(x, ...); // region pointed by x is not valid yet because previous memcpy is still in flight
Setting sync memops attribute on x would synchronize all outstanding memory operations on it but it's not supported on MallocAsync memory so this could lead to data validation issues irrespective of zcopy operations through ib/cuda_ipc or through pipeline protocols.
Any update on this?
Any update on this?
@simonbyrne SYNC_MEMOPS is still yet to be supported with Malloc Async API. We plan to support such memory once it becomes available.
replaced by #8623