MGG_OSDI23 icon indicating copy to clipboard operation
MGG_OSDI23 copied to clipboard

The reproduced performance is not exactly the same as in the paper

Open wei-mei opened this issue 2 years ago • 1 comments

Hello, I am reading your OSDI accepted article - MGG: Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication Computation Pipelining on Multi-GPU Platforms. I am using the git project you provided, but the performance shown in the paper is not achieved, such as Compare with DGL on 8xA100 for GCN (Fig.7a )

dataset speed up
Reddit_beg_pos 0.598862
enwiki-2013_beg_pos 0.980894
t-2004_beg_pos 2.319232
paper100M_beg_pos 3.729139
ogbn-products_beg_pos 2.551465
ogbn-proteins_beg_pos 0.655375
com-Orkut_beg_pos 5.647636

Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec

How should I adjust some configurations in your git to achieve the performance shown in the paper?

wei-mei avatar Aug 30 '23 06:08 wei-mei

Thanks for your interest.

  • As we mentioned in our paper evaluation ("Platforms & Tools" paragraph), the major evaluation platform is 8×A100 GPUs (40 GB) and we use AWS P4dn.24xlarge instance for evaluation.
  • For 8xA100 (80GB) due to the difference in GPU global memory bandwidth (2,039GB/s) compared to A100 (40GB) (1,555GB/s), we believe there will be additional parameter-tuning efforts for A100-80GB to achieve better performance. Some other factors like the type and the number of CPU cores of DGX-A100-80GB versus DGX-A100-40GB would also affect the performance of DGL since they rely on zero-copy access with CPU involvements for fetching remote data on the host.

YukeWang96 avatar Aug 30 '23 20:08 YukeWang96