gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

How to effectively test if gdrcopy is enabled using Real world ML workload ?

Open pandyamarut opened this issue 1 year ago • 2 comments

I have successfully installed gdrcopy on my host and completed its tests. Afterwards, I launched a container running my language model application, with a focus on profiling the loading of the model from the local disk. I am looking for methods to confirm whether gdrcopy is active when my application is running. Since I am new to this, I would appreciate any guidance on how to verify the operation of gdrcopy in this context.

pandyamarut avatar Jan 31 '24 23:01 pandyamarut

Hi @pandyamarut, Based on your question, my guess is that your application does not use GDRCopy directly. Probably you want to confirm that a library (e.g., UCX, NCCL) is properly utilizing GDRCopy? One way to do so is to export the environment variables below and rerun your application. If GDRCopy is used, you should see some output lines from GDRCopy.

export GDRCOPY_ENABLE_LOGGING=1
export GDRCOPY_LOG_LEVEL=1

pakmarkthub avatar Feb 05 '24 01:02 pakmarkthub

@pandyamarut where you able to verify whether your application is utilizing it?

dhayanesh avatar Apr 23 '24 07:04 dhayanesh