mgpusim icon indicating copy to clipboard operation
mgpusim copied to clipboard

How do multiple GPUs communicate with each other

Open Liujiaqi-jlu opened this issue 1 year ago • 2 comments

Hello, I would like to know how to explicitly observe the communication process between multiple GPUs and how they exchange memory information. I noticed that the Distribution function can map physical memory to different GPUs. Currently, my research focuses on GPU interconnect communication, so I would like to seek your advice on this. Thank you!

Liujiaqi-jlu avatar Oct 23 '24 10:10 Liujiaqi-jlu

I would not say the distribution is about GPU-GPU communication but about how the memory is allocated to GPUs.

May I know what you want to understand about GPU-GPU communication? Two points you can try to examine. One is the RDMA engine, which performs cache-line level memory access across GPUs. https://github.com/sarchlab/mgpusim/tree/v3/timing/rdma. The second is the Endpoint, which is a network component that gathers all the outgoing/incoming communication of a device. https://github.com/sarchlab/akita/blob/v3/noc/networking/switching/endpoint.go

syifan avatar Oct 24 '24 02:10 syifan

I would not say the distribution is about GPU-GPU communication but about how the memory is allocated to GPUs.

May I know what you want to understand about GPU-GPU communication? Two points you can try to examine. One is the RDMA engine, which performs cache-line level memory access across GPUs. https://github.com/sarchlab/mgpusim/tree/v3/timing/rdma. The second is the Endpoint, which is a network component that gathers all the outgoing/incoming communication of a device. https://github.com/sarchlab/akita/blob/v3/noc/networking/switching/endpoint.go

Thank you for your reply!I have known where my problem is.

Liujiaqi-jlu avatar Oct 24 '24 08:10 Liujiaqi-jlu