MGG_OSDI23
                                
                                 MGG_OSDI23 copied to clipboard
                                
                                    MGG_OSDI23 copied to clipboard
                            
                            
                            
                        The reproduced performance is not exactly the same as in the paper
Hello, I am reading your OSDI accepted article - MGG: Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication Computation Pipelining on Multi-GPU Platforms. I am using the git project you provided, but the performance shown in the paper is not achieved, such as Compare with DGL on 8xA100 for GCN (Fig.7a )
| dataset | speed up | 
|---|---|
| Reddit_beg_pos | 0.598862 | 
| enwiki-2013_beg_pos | 0.980894 | 
| t-2004_beg_pos | 2.319232 | 
| paper100M_beg_pos | 3.729139 | 
| ogbn-products_beg_pos | 2.551465 | 
| ogbn-proteins_beg_pos | 0.655375 | 
| com-Orkut_beg_pos | 5.647636 | 
Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec
How should I adjust some configurations in your git to achieve the performance shown in the paper?
Thanks for your interest.
- As we mentioned in our paper evaluation ("Platforms & Tools" paragraph), the major evaluation platform is 8×A100 GPUs (40 GB) and we use AWS P4dn.24xlarge instance for evaluation.
- For 8xA100 (80GB) due to the difference in GPU global memory bandwidth (2,039GB/s) compared to A100 (40GB) (1,555GB/s), we believe there will be additional parameter-tuning efforts for A100-80GB to achieve better performance. Some other factors like the type and the number of CPU cores of DGX-A100-80GB versus DGX-A100-40GB would also affect the performance of DGL since they rely on zero-copy access with CPU involvements for fetching remote data on the host.