cudf
cudf copied to clipboard
[QST] Can cuDF copy DataFrame from one GPU to another without going through CPU and memory?
[QST] Can cuDF copy DataFrame from one GPU to another without going through CPU and memory?
@infzo are you wanting to perform this between separate processes or the same process with multiple threads ? Giving us more info about your use case would be helpful.
Assuming these are two separate processes you can use ucx-py to send/receive data with NVLINK or GPURDMA. Using Dask/Dask-cuDF makes this kind of data transfer very easy and I would recommend that over manually constructing this. Still, it's not terrible:
- Set up an a connection between the two processes
- Serialize a cuDF dataframe:
cdf.serialize()
- Receive the dataframe and reconstruct
These steps are generally used in the ucx-py test: https://github.com/rapidsai/ucx-py/blob/branch-0.27/debug-tests/test_send_recv_many_workers.py
cc @pentschev should he have additional thoughts.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
@infzo was the proposed UCX solution sufficient for your needs?
Please feel free to re-open if you ran into any trouble with UCX!
@infzo are you wanting to perform this between separate processes or the same process with multiple threads ? Giving us more info about your use case would be helpful.
Assuming these are two separate processes you can use ucx-py to send/receive data with NVLINK or GPURDMA. Using Dask/Dask-cuDF makes this kind of data transfer very easy and I would recommend that over manually constructing this. Still, it's not terrible:
- Set up an a connection between the two processes
- Serialize a cuDF dataframe:
cdf.serialize()
- Receive the dataframe and reconstruct
These steps are generally used in the ucx-py test: https://github.com/rapidsai/ucx-py/blob/branch-0.27/debug-tests/test_send_recv_many_workers.py
cc @pentschev should he have additional thoughts.
cuda_obj = cuda_obj_generator() msg = {"data": to_serialize(cuda_obj)}
According to the cudf description, data needs to be copied to the memory during serialization. In this case, GPU passthrough still requires host memory.
class Serializable: """A serializable object composed of device memory buffers.
This base class defines a standard serialization protocol for objects
encapsulating device memory buffers. **### **_Serialization proceeds by copying
device data onto the host_,**** then returning it along with suitable metadata
for reconstruction of the object. Deserialization performs the reverse
process, copying the serialized data from the host to new device buffers.
Subclasses must define the abstract methods :meth:`~.serialize` and
:meth:`~.deserialize`. The former defines the conversion of the object
into a representative collection of metadata and data buffers, while the
latter converts back from that representation into an equivalent object.
"""