mvts_transformer
mvts_transformer copied to clipboard
Very high loss when finetuning
Dear Author,
I am running your commands and find that the pretraining process seems good while the finetuning is weird. The pretraining loss is just 0.140160306.
The commands I run are
CUDA_VISIBLE_DEVICES=4 python src/main.py --output_dir experiments --comment "pretraining through imputation" --name BeijingPM25Quality_pretrained --records_file Imputation_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_ratio 0.2 --epochs 700 --lr 0.001 --optimizer RAdam --batch_size 32 --pos_encoding learnable --d_model 128
CUDA_VISIBLE_DEVICES=1 python src/main.py --output_dir experiments --comment "finetune for regression" --name BeijingPM25Quality_finetuned --records_file Regression_records.xls --data_dir BeijingPM25Quality --data_class tsra --pattern TRAIN --val_pattern TEST --epochs 200 --lr 0.001 --optimizer RAdam --pos_encoding learnable --d_model 128 --load_model /home/xzhoubi/paperreading/mvts_transformer/experiments/BeijingPM25Quality_pretrained_2022-07-19_10-27-28_tlB/checkpoints/model_best.pth --task regression --change_output --batch_size 128
Can you please help check it?
2022-07-19 17:42:53,244 | INFO : Epoch 85 Training Summary: epoch: 85.000000 | loss: 1024.587302 |
2022-07-19 17:42:53,244 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 4.77277946472168 seconds
2022-07-19 17:42:53,244 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.6006609103258915 seconds
2022-07-19 17:42:53,245 | INFO : Avg batch train. time: 0.048943201173679694 seconds
2022-07-19 17:42:53,245 | INFO : Avg sample train. time: 0.00038602625527151295 seconds
Training Epoch: 42%|████████████████████▊ | 85/200 [07:40<10:05, 5.26s/it]Training Epoch 86 0.0% | batch: 0 of 94 | loss: 566.886
Training Epoch 86 1.1% | batch: 1 of 94 | loss: 686.58
Training Epoch 86 2.1% | batch: 2 of 94 | loss: 1297.63
Training Epoch 86 3.2% | batch: 3 of 94 | loss: 976.956
Training Epoch 86 4.3% | batch: 4 of 94 | loss: 565.19
Training Epoch 86 5.3% | batch: 5 of 94 | loss: 809.262
Training Epoch 86 6.4% | batch: 6 of 94 | loss: 1095.96
Training Epoch 86 7.4% | batch: 7 of 94 | loss: 1047.49
Training Epoch 86 8.5% | batch: 8 of 94 | loss: 782.682
Training Epoch 86 9.6% | batch: 9 of 94 | loss: 697.767
Training Epoch 86 10.6% | batch: 10 of 94 | loss: 900.141
Training Epoch 86 11.7% | batch: 11 of 94 | loss: 919.351
Training Epoch 86 12.8% | batch: 12 of 94 | loss: 782.872
Training Epoch 86 13.8% | batch: 13 of 94 | loss: 1082.41
Training Epoch 86 14.9% | batch: 14 of 94 | loss: 1004.29
Training Epoch 86 16.0% | batch: 15 of 94 | loss: 960.513
Training Epoch 86 17.0% | batch: 16 of 94 | loss: 776.499
Training Epoch 86 18.1% | batch: 17 of 94 | loss: 995.985
Training Epoch 86 19.1% | batch: 18 of 94 | loss: 655.607
Training Epoch 86 20.2% | batch: 19 of 94 | loss: 733.846
Training Epoch 86 21.3% | batch: 20 of 94 | loss: 1190.87
Training Epoch 86 22.3% | batch: 21 of 94 | loss: 698.143
Training Epoch 86 23.4% | batch: 22 of 94 | loss: 992.943
Training Epoch 86 24.5% | batch: 23 of 94 | loss: 1017.47
Training Epoch 86 25.5% | batch: 24 of 94 | loss: 696.403
Training Epoch 86 26.6% | batch: 25 of 94 | loss: 822.942
Training Epoch 86 27.7% | batch: 26 of 94 | loss: 935.869
Training Epoch 86 28.7% | batch: 27 of 94 | loss: 1040.06
Training Epoch 86 29.8% | batch: 28 of 94 | loss: 904.523
Training Epoch 86 30.9% | batch: 29 of 94 | loss: 882.923
Training Epoch 86 31.9% | batch: 30 of 94 | loss: 805.928
Training Epoch 86 33.0% | batch: 31 of 94 | loss: 803.492
Training Epoch 86 34.0% | batch: 32 of 94 | loss: 1720.69
Training Epoch 86 35.1% | batch: 33 of 94 | loss: 778.216
Training Epoch 86 36.2% | batch: 34 of 94 | loss: 729.644
Training Epoch 86 37.2% | batch: 35 of 94 | loss: 1233.58
Training Epoch 86 38.3% | batch: 36 of 94 | loss: 960.826
Training Epoch 86 39.4% | batch: 37 of 94 | loss: 986.129
Training Epoch 86 40.4% | batch: 38 of 94 | loss: 1316.68
Training Epoch 86 41.5% | batch: 39 of 94 | loss: 1351.79
Training Epoch 86 42.6% | batch: 40 of 94 | loss: 1661.48
Training Epoch 86 43.6% | batch: 41 of 94 | loss: 956.305
Training Epoch 86 44.7% | batch: 42 of 94 | loss: 1017.96
Training Epoch 86 45.7% | batch: 43 of 94 | loss: 851.958
Training Epoch 86 46.8% | batch: 44 of 94 | loss: 816.494
Training Epoch 86 47.9% | batch: 45 of 94 | loss: 603.491
Training Epoch 86 48.9% | batch: 46 of 94 | loss: 710.572
Training Epoch 86 50.0% | batch: 47 of 94 | loss: 1318.47
Training Epoch 86 51.1% | batch: 48 of 94 | loss: 905.094
Training Epoch 86 52.1% | batch: 49 of 94 | loss: 662.117
Training Epoch 86 53.2% | batch: 50 of 94 | loss: 850.853
Training Epoch 86 54.3% | batch: 51 of 94 | loss: 1007.81
Training Epoch 86 55.3% | batch: 52 of 94 | loss: 1236.99
Training Epoch 86 56.4% | batch: 53 of 94 | loss: 809.194
Training Epoch 86 57.4% | batch: 54 of 94 | loss: 1075.82
Training Epoch 86 58.5% | batch: 55 of 94 | loss: 859.909
Training Epoch 86 59.6% | batch: 56 of 94 | loss: 739.112
Training Epoch 86 60.6% | batch: 57 of 94 | loss: 992.518
Training Epoch 86 61.7% | batch: 58 of 94 | loss: 953.861
Training Epoch 86 62.8% | batch: 59 of 94 | loss: 881.18
Training Epoch 86 63.8% | batch: 60 of 94 | loss: 878.613
Training Epoch 86 64.9% | batch: 61 of 94 | loss: 1006.92
Training Epoch 86 66.0% | batch: 62 of 94 | loss: 728.144
Training Epoch 86 67.0% | batch: 63 of 94 | loss: 865.157
Training Epoch 86 68.1% | batch: 64 of 94 | loss: 895.809
Training Epoch 86 69.1% | batch: 65 of 94 | loss: 616.984
Training Epoch 86 70.2% | batch: 66 of 94 | loss: 893.007
Training Epoch 86 71.3% | batch: 67 of 94 | loss: 859.431
Training Epoch 86 72.3% | batch: 68 of 94 | loss: 1648.19
Training Epoch 86 73.4% | batch: 69 of 94 | loss: 657.725
Training Epoch 86 74.5% | batch: 70 of 94 | loss: 960.164
Training Epoch 86 75.5% | batch: 71 of 94 | loss: 666.139
Training Epoch 86 76.6% | batch: 72 of 94 | loss: 3079.8
Training Epoch 86 77.7% | batch: 73 of 94 | loss: 802.407
Training Epoch 86 78.7% | batch: 74 of 94 | loss: 1103.64
Training Epoch 86 79.8% | batch: 75 of 94 | loss: 1029.07
Training Epoch 86 80.9% | batch: 76 of 94 | loss: 1488.64
Training Epoch 86 81.9% | batch: 77 of 94 | loss: 924.513
Training Epoch 86 83.0% | batch: 78 of 94 | loss: 909.587
Training Epoch 86 84.0% | batch: 79 of 94 | loss: 862.864
Training Epoch 86 85.1% | batch: 80 of 94 | loss: 607.052
Training Epoch 86 86.2% | batch: 81 of 94 | loss: 967.5
Training Epoch 86 87.2% | batch: 82 of 94 | loss: 942.684
Training Epoch 86 88.3% | batch: 83 of 94 | loss: 1217.01
Training Epoch 86 89.4% | batch: 84 of 94 | loss: 685.092
Training Epoch 86 90.4% | batch: 85 of 94 | loss: 949.638
Training Epoch 86 91.5% | batch: 86 of 94 | loss: 737.985
Training Epoch 86 92.6% | batch: 87 of 94 | loss: 1085.89
Training Epoch 86 93.6% | batch: 88 of 94 | loss: 936.676
Training Epoch 86 94.7% | batch: 89 of 94 | loss: 1203.51
Training Epoch 86 95.7% | batch: 90 of 94 | loss: 677.801
Training Epoch 86 96.8% | batch: 91 of 94 | loss: 2214.77
Training Epoch 86 97.9% | batch: 92 of 94 | loss: 1357.56
Training Epoch 86 98.9% | batch: 93 of 94 | loss: 1019.23
2022-07-19 17:42:57,306 | INFO : Epoch 86 Training Summary: epoch: 86.000000 | loss: 974.012262 |
2022-07-19 17:42:57,307 | INFO : Epoch runtime: 0.0 hours, 0.0 minutes, 3.9919965267181396 seconds
2022-07-19 17:42:57,307 | INFO : Avg epoch train. time: 0.0 hours, 0.0 minutes, 4.593583417493243 seconds
2022-07-19 17:42:57,307 | INFO : Avg batch train. time: 0.04886790869673663 seconds
2022-07-19 17:42:57,307 | INFO : Avg sample train. time: 0.00038543240623370055 seconds
2022-07-19 17:42:57,307 | INFO : Evaluating on validation set ...
Evaluating Epoch 86 0.0% | batch: 0 of 40 | loss: 7538.28
Evaluating Epoch 86 2.5% | batch: 1 of 40 | loss: 1100.53
Evaluating Epoch 86 5.0% | batch: 2 of 40 | loss: 2441.92
Evaluating Epoch 86 7.5% | batch: 3 of 40 | loss: 7944.98
Evaluating Epoch 86 10.0% | batch: 4 of 40 | loss: 2934.04
Evaluating Epoch 86 12.5% | batch: 5 of 40 | loss: 2394.65
Evaluating Epoch 86 15.0% | batch: 6 of 40 | loss: 8225.28
Evaluating Epoch 86 17.5% | batch: 7 of 40 | loss: 3071.4
Evaluating Epoch 86 20.0% | batch: 8 of 40 | loss: 3004.23
Evaluating Epoch 86 22.5% | batch: 9 of 40 | loss: 2549.05
Evaluating Epoch 86 25.0% | batch: 10 of 40 | loss: 5039.37
Evaluating Epoch 86 27.5% | batch: 11 of 40 | loss: 1271.33
Evaluating Epoch 86 30.0% | batch: 12 of 40 | loss: 7026.6
Evaluating Epoch 86 32.5% | batch: 13 of 40 | loss: 4039.62
Evaluating Epoch 86 35.0% | batch: 14 of 40 | loss: 1919.55
Evaluating Epoch 86 37.5% | batch: 15 of 40 | loss: 3505.34
Evaluating Epoch 86 40.0% | batch: 16 of 40 | loss: 5214.82
Evaluating Epoch 86 42.5% | batch: 17 of 40 | loss: 2959.36
Evaluating Epoch 86 45.0% | batch: 18 of 40 | loss: 2551.97
Evaluating Epoch 86 47.5% | batch: 19 of 40 | loss: 6823
Evaluating Epoch 86 50.0% | batch: 20 of 40 | loss: 4544.8
Evaluating Epoch 86 52.5% | batch: 21 of 40 | loss: 1190.93
Evaluating Epoch 86 55.0% | batch: 22 of 40 | loss: 3702.28
Evaluating Epoch 86 57.5% | batch: 23 of 40 | loss: 3874.76
Evaluating Epoch 86 60.0% | batch: 24 of 40 | loss: 1572.05
Evaluating Epoch 86 62.5% | batch: 25 of 40 | loss: 3755.92
Evaluating Epoch 86 65.0% | batch: 26 of 40 | loss: 10556.1
Evaluating Epoch 86 67.5% | batch: 27 of 40 | loss: 3082.73
Evaluating Epoch 86 70.0% | batch: 28 of 40 | loss: 1867.05
Evaluating Epoch 86 72.5% | batch: 29 of 40 | loss: 10148.6
Evaluating Epoch 86 75.0% | batch: 30 of 40 | loss: 1724.54
Evaluating Epoch 86 77.5% | batch: 31 of 40 | loss: 1341.73
Evaluating Epoch 86 80.0% | batch: 32 of 40 | loss: 7704.38
Evaluating Epoch 86 82.5% | batch: 33 of 40 | loss: 7095.86
Evaluating Epoch 86 85.0% | batch: 34 of 40 | loss: 1109.71
Evaluating Epoch 86 87.5% | batch: 35 of 40 | loss: 5296.75
Evaluating Epoch 86 90.0% | batch: 36 of 40 | loss: 6882.2
Evaluating Epoch 86 92.5% | batch: 37 of 40 | loss: 2588.44
Evaluating Epoch 86 95.0% | batch: 38 of 40 | loss: 3639.52
Evaluating Epoch 86 97.5% | batch: 39 of 40 | loss: 11084.6
2022-07-19 17:42:58,800 | INFO : Validation runtime: 0.0 hours, 0.0 minutes, 1.4921939373016357 seconds
2022-07-19 17:42:58,800 | INFO : Avg val. time: 0.0 hours, 0.0 minutes, 1.4729443497127956 seconds
2022-07-19 17:42:58,800 | INFO : Avg batch val. time: 0.03682360874281989 seconds
2022-07-19 17:42:58,800 | INFO : Avg sample val. time: 0.0002917877079462749 seconds
Hi, as written in the README, these values are MSE and you will have to take the square root.
Also, as I note in the README, you should consult the tables of optimal hyperparameters in the paper to achieve the best performance. For example, for this dataset, you should do the pretraining for max. 700 epochs with a batch size of 128, not 32. I have now added this value for this dataset explicitly in the README. Finally, there is definitely variance when running experiments, and thus you would generally have to run several iterations, but in expectation get something like MSE = 2870.