MGG_OSDI23 The reproduced performance is not exactly the same as in the paper

The reproduced performance is not exactly the same as in the paper

Open wei-mei opened this issue 2 years ago • 1 comments

Hello, I am reading your OSDI accepted article - MGG: Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication Computation Pipelining on Multi-GPU Platforms. I am using the git project you provided, but the performance shown in the paper is not achieved, such as Compare with DGL on 8xA100 for GCN (Fig.7a )

dataset	speed up
Reddit_beg_pos	0.598862
enwiki-2013_beg_pos	0.980894
t-2004_beg_pos	2.319232
paper100M_beg_pos	3.729139
ogbn-products_beg_pos	2.551465
ogbn-proteins_beg_pos	0.655375
com-Orkut_beg_pos	5.647636

Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec

How should I adjust some configurations in your git to achieve the performance shown in the paper?

Aug 30 '23 06:08 wei-mei

Thanks for your interest.

As we mentioned in our paper evaluation ("Platforms & Tools" paragraph), the major evaluation platform is 8×A100 GPUs (40 GB) and we use AWS P4dn.24xlarge instance for evaluation.
For 8xA100 (80GB) due to the difference in GPU global memory bandwidth (2,039GB/s) compared to A100 (40GB) (1,555GB/s), we believe there will be additional parameter-tuning efforts for A100-80GB to achieve better performance. Some other factors like the type and the number of CPU cores of DGX-A100-80GB versus DGX-A100-40GB would also affect the performance of DGL since they rely on zero-copy access with CPU involvements for fetching remote data on the host.

Aug 30 '23 20:08 YukeWang96

MGG_OSDI23 MGG_OSDI23 copied to clipboard

The reproduced performance is not exactly the same as in the paper

MGG_OSDI23
MGG_OSDI23 copied to clipboard