GPT-GNN icon indicating copy to clipboard operation
GPT-GNN copied to clipboard

About the training time

Open SpaceLearner opened this issue 3 years ago • 0 comments

Hi, thank you for your excellent job. However, when I run the code using the default setting for pretraining on OAG_CS, the time taken for an epoch is much longer than you reported. It takes around 40 minutes for an epoch and 40 * 400 / 60 = 266.7 hours for 400 epochs, which is much longer than 12 hours in the paper. My machine is Tesla P100 and 8 * Xeon(R) CPU E5-2690 v4. How can I solve this problem?

The following is the log

+--------------------+-------------------------------------------+ | Parameter | Value | +--------------------+-------------------------------------------+ | attr_ratio | 0.500 | +--------------------+-------------------------------------------+ | attr_type | text | +--------------------+-------------------------------------------+ | neg_samp_num | 255 | +--------------------+-------------------------------------------+ | queue_size | 256 | +--------------------+-------------------------------------------+ | w2v_dir | /data/data0/gjy/dataset/OAG/w2v_all | +--------------------+-------------------------------------------+ | data_dir | /data/data0/gjy/dataset/OAG/graph_CS.pk | +--------------------+-------------------------------------------+ | pretrain_model_dir | /data/data0/gjy/GPT-GNN/saved/OAG/gnn.pkl | +--------------------+-------------------------------------------+ | cuda | 7 | +--------------------+-------------------------------------------+ | sample_depth | 3 | +--------------------+-------------------------------------------+ | sample_width | 128 | +--------------------+-------------------------------------------+ | conv_name | hgt | +--------------------+-------------------------------------------+ | n_hid | 400 | +--------------------+-------------------------------------------+ | n_heads | 8 | +--------------------+-------------------------------------------+ | n_layers | 3 | +--------------------+-------------------------------------------+ | prev_norm | 1 | +--------------------+-------------------------------------------+ | last_norm | 1 | +--------------------+-------------------------------------------+ | dropout | 0.200 | +--------------------+-------------------------------------------+ | max_lr | 0.001 | +--------------------+-------------------------------------------+ | scheduler | cycle | +--------------------+-------------------------------------------+ | n_epoch | 200 | +--------------------+-------------------------------------------+ | n_pool | 8 | +--------------------+-------------------------------------------+ | n_batch | 32 | +--------------------+-------------------------------------------+ | batch_size | 256 | +--------------------+-------------------------------------------+ | clip | 0.500 | +--------------------+-------------------------------------------+ cuda:7 Start Loading Graph Data... Finish Loading Graph Data! paper PP_cite paper rev_PP_cite venue rev_PV_Conference venue rev_PV_Journal field rev_PF_in_L3 field rev_PF_in_L1 field rev_PF_in_L2 field rev_PF_in_L4 author AP_write_last author AP_write_other author AP_write_first Start Pretraining... Data Preparation: 68.7s Epoch: 1, (1 / 41) 45.3s LR: 0.00005 Train Loss: (4.773, 9.771) Valid Loss: (4.762, 8.815) NDCG: 0.314 Norm: 20.012 queue: 1 UPDATE!!! Data Preparation: 57.1s Epoch: 1, (2 / 41) 40.3s LR: 0.00005 Train Loss: (4.594, 8.514) Valid Loss: (4.532, 7.968) NDCG: 0.353 Norm: 20.025 queue: 1 UPDATE!!! Data Preparation: 29.7s Epoch: 1, (3 / 41) 38.4s LR: 0.00006 Train Loss: (4.469, 7.768) Valid Loss: (4.628, 7.167) NDCG: 0.359 Norm: 20.035 queue: 1 UPDATE!!! Data Preparation: 17.0s Epoch: 1, (4 / 41) 36.8s LR: 0.00006 Train Loss: (4.426, 7.283) Valid Loss: (4.453, 6.991) NDCG: 0.367 Norm: 20.043 queue: 1 UPDATE!!! Data Preparation: 13.0s Epoch: 1, (5 / 41) 36.8s LR: 0.00007 Train Loss: (4.375, 7.060) Valid Loss: (4.509, 6.793) NDCG: 0.365 Norm: 20.047 queue: 1 UPDATE!!! Data Preparation: 12.3s

SpaceLearner avatar Jul 02 '21 04:07 SpaceLearner