能否提供一下PEMS所有数据集的96步长预测结果
在4090单卡环境进行训练,完全使用脚本中相同的训练参数,但是在PEMS08数据集上并没有取得论文中的结果,相差甚远,无论是否修改use_norm
并且不只是PEMS08数据集,其他PEMS数据集同样存在这样的问题,短步长预测大致符合,48,96步长差距过大。
我也遇到类似的问题,不过似乎只出现在PEMS07跟PEMS08两个数据集上,并且我的input_len比原文所使用的更长。
是的,
我在论文的模型基础上改进,发现大部分的数据集步长都是可以超越的,部分和源码中采用相同的不使用norm方法训练也超越了论文中的结果,唯独08数据集的48和96步长效果比较差,复现之后发现实际也跑不出相同的结果,所以想问一下作者。
@FrankHo-Hwc
是的,
我在论文的模型基础上改进,发现大部分的数据集步长都是可以超越的,部分和源码中采用相同的不使用norm方法训练也超越了论文中的结果,唯独08数据集的48和96步长效果比较差,复现之后发现实际也跑不出相同的结果,所以想问一下作者。 @FrankHo-Hwc
是的,我看前面的issue说调节学习率跟use_norm项可以让结果好一些。在96那个步长虽然结果有好转,但是还是不如论文的结果,所以还是得作者回应一下。
@bigdata0 what settings did you use to get the results reported in their paper for each of the PEMS dataset? I ran their script and adjusted use_norm, but still found the values to be quite different even on PEMS03, and PEMS04.
@JerayuT
- Check if your itransformer.py is from the Time series Library or this repository. In the version integrated into the Time series Library, use_norm is used by default and cannot be modified through the command line. However, in this version of the repository, use_norm can be modified.
- In the same dataset, for a certain step size prediction, you can try setting use_norm to 0, while for another step size prediction, do not set use_norm to 0. For example, when using the first 96 steps to predict 12 steps, do not set use_norm to 0, while predicting 96 steps, set it. I can reproduce most of the results in the paper through the above methods, but there are still some data that cannot be reproduced.
@bigdata0 I'm currently using this version of the repository. I was also wondering if you adjusted the learning rate or anything else. When I ran the PEMS03 dataset and adjusted the use_norm to 0 for step size prediction {12,24,48,96}. I got improved results from my previous runs, but for step size 48 and 96, the values are still quite far from the results in the paper.
@JerayuT Okay, you can refer to my configuration parameters. python -u run.py --is_training 1 --root_path ./dataset/PEMS/ --data_path PEMS03.npz --model_id PEMS03_96_96 --model iTransformer --data PEMS --features M --seq_len 96 --pred_len 96 --e_layers 4 --enc_in 358 --dec_in 358 --c_out 358 --des 'Exp' --d_model 512 --d_ff 512 --lea rning_rate 0.001 --itr 1 --use_norm 0 Args in experiment: Namespace(is_training=1, model_id='PEMS03_96_96', model='iTransformer', data='PEMS', root_path='./dataset/PEMS/', data_path='PEMS03.npz', features='M', target='OT', freq='h', checkpoints='./checkpoints/', seq_len=96, label_len=48, pred_len=96, enc_in=358, dec_in=358, c_out=358, d_model=512, n_heads=8, e_layers=4, d_layers=1, d_ff=512, moving_avg=25, factor=1, distil=True, dropout=0.1, embed='timeF', activation='gelu', output_attention=False, do_predict=False, num_workers=10, itr=1, train_epochs=10, batch_size=32, patience=3, learning_rate=0.001, des='Exp', loss='MSE', lradj='type1', use_amp=False, use_gpu=True, gpu=0, use_multi_gpu=False, devices='0,1,2,3', exp_name='MTSF', channel_independence=False, inverse=False, class_strategy='projection', target_root_path='./data/electricity/', target_data_path='electricity.csv', efficient_training=False, use_norm=0, partial_start_index=0) Use GPU: cuda:0
start training : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>> train 15533 val 5051 test 5051 iters: 100, epoch: 1 | loss: 0.2663582 speed: 0.0661s/iter; left time: 313.9474s iters: 200, epoch: 1 | loss: 0.2135554 speed: 0.0608s/iter; left time: 282.5925s iters: 300, epoch: 1 | loss: 0.2013143 speed: 0.0572s/iter; left time: 260.3726s iters: 400, epoch: 1 | loss: 0.1978480 speed: 0.0608s/iter; left time: 270.5316s Epoch: 1 cost time: 30.010493993759155 Epoch: 1, Steps: 485 | Train Loss: 0.2256930 Vali Loss: 0.1850095 Test Loss: 0.2440917 Validation loss decreased (inf --> 0.185009). Saving model ... Updating learning rate to 0.001 iters: 100, epoch: 2 | loss: 0.1753062 speed: 1.6842s/iter; left time: 7184.8671s iters: 200, epoch: 2 | loss: 0.1824031 speed: 0.0648s/iter; left time: 269.8704s iters: 300, epoch: 2 | loss: 0.1627843 speed: 0.0622s/iter; left time: 253.0022s iters: 400, epoch: 2 | loss: 0.1522650 speed: 0.0618s/iter; left time: 244.9026s Epoch: 2 cost time: 31.733307361602783 Epoch: 2, Steps: 485 | Train Loss: 0.1630596 Vali Loss: 0.1565045 Test Loss: 0.2239741 Validation loss decreased (0.185009 --> 0.156505). Saving model ... Updating learning rate to 0.0005 iters: 100, epoch: 3 | loss: 0.1193284 speed: 1.7949s/iter; left time: 6786.6307s iters: 200, epoch: 3 | loss: 0.1366769 speed: 0.0643s/iter; left time: 236.7496s iters: 300, epoch: 3 | loss: 0.1275335 speed: 0.0637s/iter; left time: 228.0470s iters: 400, epoch: 3 | loss: 0.1218285 speed: 0.0650s/iter; left time: 226.2647s Epoch: 3 cost time: 32.51231002807617 Epoch: 3, Steps: 485 | Train Loss: 0.1272281 Vali Loss: 0.1349931 Test Loss: 0.2023265 Validation loss decreased (0.156505 --> 0.134993). Saving model ... Updating learning rate to 0.00025 iters: 100, epoch: 4 | loss: 0.1189363 speed: 1.8123s/iter; left time: 5973.2009s iters: 200, epoch: 4 | loss: 0.1217045 speed: 0.0607s/iter; left time: 194.1133s iters: 300, epoch: 4 | loss: 0.1013873 speed: 0.0603s/iter; left time: 186.7840s iters: 400, epoch: 4 | loss: 0.1246471 speed: 0.0626s/iter; left time: 187.6594s Epoch: 4 cost time: 30.79427933692932 Epoch: 4, Steps: 485 | Train Loss: 0.1147907 Vali Loss: 0.1244477 Test Loss: 0.1876199 Validation loss decreased (0.134993 --> 0.124448). Saving model ... Updating learning rate to 0.000125 iters: 100, epoch: 5 | loss: 0.1103279 speed: 1.8241s/iter; left time: 5127.6685s iters: 200, epoch: 5 | loss: 0.1176323 speed: 0.0640s/iter; left time: 173.4984s iters: 300, epoch: 5 | loss: 0.1146321 speed: 0.0663s/iter; left time: 173.0208s iters: 400, epoch: 5 | loss: 0.1095776 speed: 0.0660s/iter; left time: 165.7033s Epoch: 5 cost time: 32.98799157142639 Epoch: 5, Steps: 485 | Train Loss: 0.1096915 Vali Loss: 0.1192572 Test Loss: 0.1804807 Validation loss decreased (0.124448 --> 0.119257). Saving model ... Updating learning rate to 6.25e-05 iters: 100, epoch: 6 | loss: 0.1013640 speed: 1.7467s/iter; left time: 4062.9149s iters: 200, epoch: 6 | loss: 0.1459941 speed: 0.0680s/iter; left time: 151.3634s iters: 300, epoch: 6 | loss: 0.0991267 speed: 0.0720s/iter; left time: 153.0836s iters: 400, epoch: 6 | loss: 0.1169957 speed: 0.0729s/iter; left time: 147.6160s Epoch: 6 cost time: 35.339478731155396 Epoch: 6, Steps: 485 | Train Loss: 0.1069848 Vali Loss: 0.1189177 Test Loss: 0.1792915 Validation loss decreased (0.119257 --> 0.118918). Saving model ... Updating learning rate to 3.125e-05 iters: 100, epoch: 7 | loss: 0.0993732 speed: 1.7634s/iter; left time: 3246.3705s iters: 200, epoch: 7 | loss: 0.1180835 speed: 0.0668s/iter; left time: 116.2247s iters: 300, epoch: 7 | loss: 0.0905823 speed: 0.0642s/iter; left time: 105.3868s iters: 400, epoch: 7 | loss: 0.1043306 speed: 0.0647s/iter; left time: 99.7038s Epoch: 7 cost time: 32.913817405700684 Epoch: 7, Steps: 485 | Train Loss: 0.1055459 Vali Loss: 0.1175509 Test Loss: 0.1766107 Validation loss decreased (0.118918 --> 0.117551). Saving model ... Updating learning rate to 1.5625e-05 iters: 100, epoch: 8 | loss: 0.1248059 speed: 1.7733s/iter; left time: 2404.6006s iters: 200, epoch: 8 | loss: 0.1046225 speed: 0.0637s/iter; left time: 79.9484s iters: 300, epoch: 8 | loss: 0.1021659 speed: 0.0630s/iter; left time: 72.8100s iters: 400, epoch: 8 | loss: 0.1225355 speed: 0.0610s/iter; left time: 64.4164s Epoch: 8 cost time: 31.724972248077393 Epoch: 8, Steps: 485 | Train Loss: 0.1047085 Vali Loss: 0.1169733 Test Loss: 0.1760689 Validation loss decreased (0.117551 --> 0.116973). Saving model ... Updating learning rate to 7.8125e-06 iters: 100, epoch: 9 | loss: 0.1084197 speed: 1.7494s/iter; left time: 1523.6881s iters: 200, epoch: 9 | loss: 0.1012328 speed: 0.0658s/iter; left time: 50.7213s iters: 300, epoch: 9 | loss: 0.1137820 speed: 0.0622s/iter; left time: 41.7486s iters: 400, epoch: 9 | loss: 0.1025800 speed: 0.0630s/iter; left time: 35.9713s Epoch: 9 cost time: 32.4756760597229 Epoch: 9, Steps: 485 | Train Loss: 0.1042602 Vali Loss: 0.1166037 Test Loss: 0.1761539 Validation loss decreased (0.116973 --> 0.116604). Saving model ... Updating learning rate to 3.90625e-06 iters: 100, epoch: 10 | loss: 0.1130980 speed: 1.7666s/iter; left time: 681.9241s iters: 200, epoch: 10 | loss: 0.1034796 speed: 0.0580s/iter; left time: 16.5877s iters: 300, epoch: 10 | loss: 0.0804347 speed: 0.0600s/iter; left time: 11.1661s iters: 400, epoch: 10 | loss: 0.0938171 speed: 0.0621s/iter; left time: 5.3409s Epoch: 10 cost time: 30.659968376159668 Epoch: 10, Steps: 485 | Train Loss: 0.1040058 Vali Loss: 0.1168290 Test Loss: 0.1764464 EarlyStopping counter: 1 out of 3 Updating learning rate to 1.953125e-06 testing : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< test 5051 test shape: (5051, 1, 96, 358) (5051, 1, 96, 358) test shape: (5051, 96, 358) (5051, 96, 358) mse:0.17615370452404022, mae:0.2849079966545105
我在论文的模型基础上改进,发现大部分的数据集步长都是可以超越的,部分和源码中采用相同的不使用norm方法训练也超越了论文中的结果,唯独08数据集的48和96步长效果比较差,复现之后发现实际也跑不出相同的结果,所以想问一下作者。 @FrankHo-Hwc