fairseq
fairseq copied to clipboard
Training accuracy always being 20% when Finetuning RoBERTa on Commonsense QA
🐛 Bug
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
- Run finetune command in examples/roberta/commonsense_qa/README.md
The training acc is always about 20%, even after many many steps.
2023-10-27 21:16:54 | INFO | fairseq_cli.train | task: CommonsenseQATask
2023-10-27 21:16:54 | INFO | fairseq_cli.train | model: RobertaModel
2023-10-27 21:16:54 | INFO | fairseq_cli.train | criterion: SentenceRankingCriterion
2023-10-27 21:16:54 | INFO | fairseq_cli.train | num. shared model params: 356,461,658 (num. trained: 356,461,658)
2023-10-27 21:16:54 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
| Loaded valid with 1221 samples
2023-10-27 21:17:05 | INFO | fairseq.trainer | detected shared parameter: encoder.sentence_encoder.embed_tokens.weight <- encoder.lm_head.weight
2023-10-27 21:17:05 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2023-10-27 21:17:05 | INFO | fairseq.utils | rank 0: capabilities = 8.6 ; total memory = 23.700 GB ; name = GeForce RTX 3090
2023-10-27 21:17:05 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2023-10-27 21:17:05 | INFO | fairseq_cli.train | training on 1 devices (GPUs/TPUs)
2023-10-27 21:17:05 | INFO | fairseq_cli.train | max tokens per device = None and max sentences per device = 8
2023-10-27 21:17:05 | INFO | fairseq.trainer | Preparing to load checkpoint /examples/roberta/roberta.large/model.pt
2023-10-27 21:17:05 | INFO | fairseq.trainer | No existing checkpoint found /examples/roberta/roberta.large/model.pt
2023-10-27 21:17:05 | INFO | fairseq.trainer | loading train data for epoch 1
| Loaded train with 9741 samples
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:17:28 | INFO | fairseq_cli.train | begin dry-run validation on "valid" subset
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:17:30 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 21:17:30 | INFO | fairseq.trainer | begin training epoch 1
2023-10-27 21:17:30 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 21:18:17 | INFO | train_inner | epoch 001: 25 / 1217 loss=2.345, nll_loss=0.099, accuracy=14.5, wps=132, ups=0.7, wpb=190.1, bsz=8, num_updates=25, lr=1.66667e-06, gnorm=11.976, loss_scale=128, train_wall=46, gb_free=15.6, wall=72
2023-10-27 21:18:53 | INFO | train_inner | epoch 001: 50 / 1217 loss=2.33, nll_loss=0.096, accuracy=18, wps=131.6, ups=0.68, wpb=193.4, bsz=8, num_updates=50, lr=3.33333e-06, gnorm=11.059, loss_scale=128, train_wall=36, gb_free=15.1, wall=109
2023-10-27 21:19:28 | INFO | train_inner | epoch 001: 75 / 1217 loss=2.348, nll_loss=0.099, accuracy=16.5, wps=139, ups=0.73, wpb=190.6, bsz=8, num_updates=75, lr=5e-06, gnorm=9.757, loss_scale=128, train_wall=34, gb_free=15.7, wall=143
2023-10-27 21:20:01 | INFO | train_inner | epoch 001: 100 / 1217 loss=2.33, nll_loss=0.095, accuracy=16, wps=146.9, ups=0.75, wpb=196.5, bsz=8, num_updates=100, lr=6.66667e-06, gnorm=7.484, loss_scale=128, train_wall=33, gb_free=15.7, wall=176
2023-10-27 21:20:38 | INFO | train_inner | epoch 001: 125 / 1217 loss=2.326, nll_loss=0.097, accuracy=23.5, wps=129.2, ups=0.67, wpb=192.3, bsz=8, num_updates=125, lr=8.33333e-06, gnorm=5.903, loss_scale=128, train_wall=37, gb_free=15.7, wall=214
2023-10-27 21:21:14 | INFO | train_inner | epoch 001: 150 / 1217 loss=2.323, nll_loss=0.096, accuracy=19, wps=136.2, ups=0.7, wpb=194.2, bsz=8, num_updates=150, lr=1e-05, gnorm=5.444, loss_scale=128, train_wall=35, gb_free=15.7, wall=249
2023-10-27 21:21:49 | INFO | train_inner | epoch 001: 175 / 1217 loss=2.324, nll_loss=0.096, accuracy=17.5, wps=138.7, ups=0.72, wpb=192.9, bsz=8, num_updates=175, lr=9.95726e-06, gnorm=4.921, loss_scale=128, train_wall=34, gb_free=15.7, wall=284
2023-10-27 21:22:05 | INFO | train_inner | epoch 001: 200 / 1217 loss=2.342, nll_loss=0.097, accuracy=17.5, wps=307.1, ups=1.59, wpb=192.6, bsz=8, num_updates=200, lr=9.91453e-06, gnorm=4.516, loss_scale=128, train_wall=16, gb_free=15.7, wall=300
2023-10-27 21:22:35 | INFO | train_inner | epoch 001: 225 / 1217 loss=2.329, nll_loss=0.099, accuracy=19.5, wps=157.5, ups=0.83, wpb=188.8, bsz=8, num_updates=225, lr=9.87179e-06, gnorm=4.041, loss_scale=128, train_wall=30, gb_free=15.5, wall=330
2023-10-27 21:23:09 | INFO | train_inner | epoch 001: 250 / 1217 loss=2.313, nll_loss=0.096, accuracy=25, wps=138.9, ups=0.72, wpb=193.4, bsz=8, num_updates=250, lr=9.82906e-06, gnorm=4.022, loss_scale=128, train_wall=34, gb_free=15.7, wall=364
2023-10-27 21:23:44 | INFO | train_inner | epoch 001: 275 / 1217 loss=2.341, nll_loss=0.097, accuracy=18, wps=141.1, ups=0.73, wpb=192.9, bsz=8, num_updates=275, lr=9.78632e-06, gnorm=3.977, loss_scale=128, train_wall=34, gb_free=15.7, wall=399
2023-10-27 21:24:17 | INFO | train_inner | epoch 001: 300 / 1217 loss=2.321, nll_loss=0.097, accuracy=24, wps=142.6, ups=0.75, wpb=190.7, bsz=8, num_updates=300, lr=9.74359e-06, gnorm=3.642, loss_scale=128, train_wall=33, gb_free=15.7, wall=432
2023-10-27 21:24:51 | INFO | train_inner | epoch 001: 325 / 1217 loss=2.322, nll_loss=0.099, accuracy=23.5, wps=138.2, ups=0.74, wpb=186.8, bsz=8, num_updates=325, lr=9.70085e-06, gnorm=3.695, loss_scale=128, train_wall=33, gb_free=15.7, wall=466
2023-10-27 21:25:23 | INFO | train_inner | epoch 001: 350 / 1217 loss=2.328, nll_loss=0.098, accuracy=24.5, wps=149.4, ups=0.79, wpb=190.2, bsz=8, num_updates=350, lr=9.65812e-06, gnorm=3.775, loss_scale=128, train_wall=31, gb_free=15.5, wall=498
2023-10-27 21:25:56 | INFO | train_inner | epoch 001: 375 / 1217 loss=2.34, nll_loss=0.099, accuracy=14.5, wps=140.7, ups=0.74, wpb=189.6, bsz=8, num_updates=375, lr=9.61538e-06, gnorm=3.727, loss_scale=128, train_wall=33, gb_free=15.7, wall=531
2023-10-27 21:26:30 | INFO | train_inner | epoch 001: 400 / 1217 loss=2.32, nll_loss=0.097, accuracy=24, wps=141.5, ups=0.74, wpb=190.5, bsz=8, num_updates=400, lr=9.57265e-06, gnorm=3.788, loss_scale=128, train_wall=33, gb_free=15.7, wall=565
2023-10-27 21:27:03 | INFO | train_inner | epoch 001: 425 / 1217 loss=2.331, nll_loss=0.096, accuracy=21, wps=145.5, ups=0.75, wpb=193.7, bsz=8, num_updates=425, lr=9.52991e-06, gnorm=3.777, loss_scale=128, train_wall=33, gb_free=15.4, wall=598
2023-10-27 21:27:37 | INFO | train_inner | epoch 001: 450 / 1217 loss=2.32, nll_loss=0.093, accuracy=22.5, wps=150.2, ups=0.75, wpb=200.4, bsz=8, num_updates=450, lr=9.48718e-06, gnorm=3.626, loss_scale=128, train_wall=33, gb_free=15.7, wall=632
2023-10-27 21:28:10 | INFO | train_inner | epoch 001: 475 / 1217 loss=2.313, nll_loss=0.096, accuracy=23.5, wps=141.5, ups=0.74, wpb=191.8, bsz=8, num_updates=475, lr=9.44444e-06, gnorm=3.635, loss_scale=128, train_wall=33, gb_free=15.7, wall=666
2023-10-27 21:28:45 | INFO | train_inner | epoch 001: 500 / 1217 loss=2.327, nll_loss=0.095, accuracy=23, wps=141.9, ups=0.73, wpb=195, bsz=8, num_updates=500, lr=9.40171e-06, gnorm=3.646, loss_scale=128, train_wall=34, gb_free=15.5, wall=700
2023-10-27 21:29:18 | INFO | train_inner | epoch 001: 525 / 1217 loss=2.326, nll_loss=0.097, accuracy=23, wps=145.2, ups=0.75, wpb=192.6, bsz=8, num_updates=525, lr=9.35897e-06, gnorm=3.726, loss_scale=128, train_wall=33, gb_free=15.7, wall=733
2023-10-27 21:29:53 | INFO | train_inner | epoch 001: 550 / 1217 loss=2.334, nll_loss=0.098, accuracy=17.5, wps=136.9, ups=0.72, wpb=191.1, bsz=8, num_updates=550, lr=9.31624e-06, gnorm=3.679, loss_scale=128, train_wall=34, gb_free=15.7, wall=768
2023-10-27 21:30:27 | INFO | train_inner | epoch 001: 575 / 1217 loss=2.331, nll_loss=0.093, accuracy=21, wps=147, ups=0.73, wpb=201, bsz=8, num_updates=575, lr=9.2735e-06, gnorm=3.621, loss_scale=128, train_wall=34, gb_free=15.7, wall=802
2023-10-27 21:31:00 | INFO | train_inner | epoch 001: 600 / 1217 loss=2.325, nll_loss=0.099, accuracy=21.5, wps=143.3, ups=0.76, wpb=187.6, bsz=8, num_updates=600, lr=9.23077e-06, gnorm=3.932, loss_scale=128, train_wall=32, gb_free=15.7, wall=835
2023-10-27 21:31:32 | INFO | train_inner | epoch 001: 625 / 1217 loss=2.331, nll_loss=0.098, accuracy=18, wps=150.2, ups=0.79, wpb=190.7, bsz=8, num_updates=625, lr=9.18803e-06, gnorm=3.786, loss_scale=128, train_wall=31, gb_free=15.7, wall=867
2023-10-27 21:32:05 | INFO | train_inner | epoch 001: 650 / 1217 loss=2.335, nll_loss=0.097, accuracy=18.5, wps=142.1, ups=0.74, wpb=191.6, bsz=8, num_updates=650, lr=9.1453e-06, gnorm=3.613, loss_scale=128, train_wall=33, gb_free=15.7, wall=900
2023-10-27 21:32:39 | INFO | train_inner | epoch 001: 675 / 1217 loss=2.324, nll_loss=0.094, accuracy=17.5, wps=146.5, ups=0.74, wpb=197.6, bsz=8, num_updates=675, lr=9.10256e-06, gnorm=3.37, loss_scale=128, train_wall=33, gb_free=15.4, wall=934
2023-10-27 21:33:10 | INFO | train_inner | epoch 001: 700 / 1217 loss=2.333, nll_loss=0.1, accuracy=19.5, wps=149.8, ups=0.8, wpb=186.9, bsz=8, num_updates=700, lr=9.05983e-06, gnorm=3.202, loss_scale=128, train_wall=31, gb_free=15.7, wall=965
2023-10-27 21:33:45 | INFO | train_inner | epoch 001: 725 / 1217 loss=2.33, nll_loss=0.099, accuracy=19, wps=136.6, ups=0.72, wpb=189, bsz=8, num_updates=725, lr=9.01709e-06, gnorm=3.108, loss_scale=128, train_wall=34, gb_free=15.7, wall=1000
2023-10-27 21:34:18 | INFO | train_inner | epoch 001: 750 / 1217 loss=2.33, nll_loss=0.097, accuracy=17.5, wps=144.7, ups=0.75, wpb=192.8, bsz=8, num_updates=750, lr=8.97436e-06, gnorm=2.985, loss_scale=128, train_wall=33, gb_free=15.7, wall=1033
2023-10-27 21:34:51 | INFO | train_inner | epoch 001: 775 / 1217 loss=2.32, nll_loss=0.099, accuracy=19, wps=144.8, ups=0.77, wpb=188.2, bsz=8, num_updates=775, lr=8.93162e-06, gnorm=2.999, loss_scale=128, train_wall=32, gb_free=15.7, wall=1066
2023-10-27 21:35:23 | INFO | train_inner | epoch 001: 800 / 1217 loss=2.327, nll_loss=0.096, accuracy=19, wps=149.8, ups=0.77, wpb=193.6, bsz=8, num_updates=800, lr=8.88889e-06, gnorm=3.009, loss_scale=128, train_wall=32, gb_free=15.7, wall=1098
2023-10-27 21:35:56 | INFO | train_inner | epoch 001: 825 / 1217 loss=2.334, nll_loss=0.097, accuracy=15.5, wps=143.5, ups=0.75, wpb=191.9, bsz=8, num_updates=825, lr=8.84615e-06, gnorm=2.958, loss_scale=128, train_wall=33, gb_free=15.7, wall=1131
2023-10-27 21:36:30 | INFO | train_inner | epoch 001: 850 / 1217 loss=2.316, nll_loss=0.096, accuracy=23, wps=142.4, ups=0.74, wpb=192.7, bsz=8, num_updates=850, lr=8.80342e-06, gnorm=2.937, loss_scale=128, train_wall=33, gb_free=15.7, wall=1165
2023-10-27 21:37:05 | INFO | train_inner | epoch 001: 875 / 1217 loss=2.322, nll_loss=0.094, accuracy=18, wps=144, ups=0.73, wpb=198.2, bsz=8, num_updates=875, lr=8.76068e-06, gnorm=2.957, loss_scale=128, train_wall=34, gb_free=15.7, wall=1200
2023-10-27 21:37:38 | INFO | train_inner | epoch 001: 900 / 1217 loss=2.34, nll_loss=0.096, accuracy=16.5, wps=147.5, ups=0.76, wpb=194.8, bsz=8, num_updates=900, lr=8.71795e-06, gnorm=2.984, loss_scale=128, train_wall=33, gb_free=15.7, wall=1233
2023-10-27 21:38:12 | INFO | train_inner | epoch 001: 925 / 1217 loss=2.338, nll_loss=0.097, accuracy=16.5, wps=137.9, ups=0.72, wpb=192.1, bsz=8, num_updates=925, lr=8.67521e-06, gnorm=2.919, loss_scale=128, train_wall=34, gb_free=15.7, wall=1268
2023-10-27 21:38:48 | INFO | train_inner | epoch 001: 950 / 1217 loss=2.319, nll_loss=0.096, accuracy=20.5, wps=134.5, ups=0.7, wpb=192.8, bsz=8, num_updates=950, lr=8.63248e-06, gnorm=2.824, loss_scale=128, train_wall=35, gb_free=15.7, wall=1303
2023-10-27 21:39:21 | INFO | train_inner | epoch 001: 975 / 1217 loss=2.313, nll_loss=0.092, accuracy=27.5, wps=151.9, ups=0.76, wpb=200.2, bsz=8, num_updates=975, lr=8.58974e-06, gnorm=2.884, loss_scale=128, train_wall=33, gb_free=15.7, wall=1336
2023-10-27 21:39:55 | INFO | train_inner | epoch 001: 1000 / 1217 loss=2.348, nll_loss=0.094, accuracy=12.5, wps=147.3, ups=0.74, wpb=198.9, bsz=8, num_updates=1000, lr=8.54701e-06, gnorm=2.858, loss_scale=128, train_wall=33, gb_free=15.7, wall=1370
2023-10-27 21:40:29 | INFO | train_inner | epoch 001: 1025 / 1217 loss=2.315, nll_loss=0.097, accuracy=20.5, wps=138.2, ups=0.73, wpb=190.2, bsz=8, num_updates=1025, lr=8.50427e-06, gnorm=2.81, loss_scale=128, train_wall=34, gb_free=15.7, wall=1405
2023-10-27 21:41:05 | INFO | train_inner | epoch 001: 1050 / 1217 loss=2.32, nll_loss=0.095, accuracy=18.5, wps=137.4, ups=0.7, wpb=196.3, bsz=8, num_updates=1050, lr=8.46154e-06, gnorm=2.863, loss_scale=128, train_wall=35, gb_free=15.7, wall=1440
2023-10-27 21:41:41 | INFO | train_inner | epoch 001: 1075 / 1217 loss=2.321, nll_loss=0.098, accuracy=20.5, wps=132.2, ups=0.7, wpb=189.5, bsz=8, num_updates=1075, lr=8.4188e-06, gnorm=2.86, loss_scale=128, train_wall=35, gb_free=15.7, wall=1476
2023-10-27 21:42:17 | INFO | train_inner | epoch 001: 1100 / 1217 loss=2.335, nll_loss=0.095, accuracy=19.5, wps=136.5, ups=0.69, wpb=197.6, bsz=8, num_updates=1100, lr=8.37607e-06, gnorm=2.823, loss_scale=128, train_wall=36, gb_free=15.7, wall=1512
2023-10-27 21:42:51 | INFO | train_inner | epoch 001: 1125 / 1217 loss=2.319, nll_loss=0.096, accuracy=21.5, wps=141.4, ups=0.73, wpb=192.6, bsz=8, num_updates=1125, lr=8.33333e-06, gnorm=2.722, loss_scale=128, train_wall=34, gb_free=15.7, wall=1546
2023-10-27 21:43:25 | INFO | train_inner | epoch 001: 1150 / 1217 loss=2.327, nll_loss=0.098, accuracy=15.5, wps=138.5, ups=0.73, wpb=190, bsz=8, num_updates=1150, lr=8.2906e-06, gnorm=2.709, loss_scale=128, train_wall=34, gb_free=15.7, wall=1581
2023-10-27 21:44:03 | INFO | train_inner | epoch 001: 1175 / 1217 loss=2.339, nll_loss=0.092, accuracy=17.5, wps=135.1, ups=0.66, wpb=204.3, bsz=8, num_updates=1175, lr=8.24786e-06, gnorm=2.722, loss_scale=128, train_wall=37, gb_free=15.7, wall=1618
2023-10-27 21:44:38 | INFO | train_inner | epoch 001: 1200 / 1217 loss=2.325, nll_loss=0.096, accuracy=17, wps=139.2, ups=0.72, wpb=193.4, bsz=8, num_updates=1200, lr=8.20513e-06, gnorm=2.656, loss_scale=128, train_wall=34, gb_free=15.7, wall=1653
2023-10-27 21:45:04 | INFO | fairseq_cli.train | begin validation on "valid" subset
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:46:07 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 2.322 | nll_loss 0.097 | accuracy 20.4 | wps 464.7 | wpb 191.6 | bsz 8 | num_updates 1217
2023-10-27 21:46:07 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 1 @ 1217 updates
2023-10-27 21:46:07 | INFO | fairseq.trainer | Saving checkpoint to /mnt/data/lizuchao/gongrh/fairseq/checkpoints/checkpoint_best.pt
2023-10-27 21:46:25 | INFO | fairseq.trainer | Finished saving checkpoint to /mnt/data/lizuchao/gongrh/fairseq/checkpoints/checkpoint_best.pt
2023-10-27 21:46:25 | INFO | fairseq.checkpoint_utils | Saved checkpoint checkpoints/checkpoint_best.pt (epoch 1 @ 1217 updates, score 20.4) (writing took 17.92632083798526 seconds)
2023-10-27 21:46:25 | INFO | fairseq_cli.train | end of epoch 1 (average epoch stats below)
2023-10-27 21:46:25 | INFO | train | epoch 001 | loss 2.328 | nll_loss 0.096 | accuracy 19.5 | wps 136.3 | ups 0.71 | wpb 193 | bsz 8 | num_updates 1217 | lr 8.17607e-06 | gnorm 4 | loss_scale 128 | train_wall 1633 | gb_free 15.7 | wall 1760
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 2
2023-10-27 21:46:25 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 21:46:25 | INFO | fairseq.trainer | begin training epoch 2
2023-10-27 21:46:25 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 21:46:37 | INFO | train_inner | epoch 002: 8 / 1217 loss=2.325, nll_loss=0.095, accuracy=20.5, wps=41, ups=0.21, wpb=195.4, bsz=8, num_updates=1225, lr=8.16239e-06, gnorm=2.667, loss_scale=128, train_wall=37, gb_free=15.7, wall=1772
2023-10-27 21:47:11 | INFO | train_inner | epoch 002: 33 / 1217 loss=2.34, nll_loss=0.097, accuracy=14, wps=144.1, ups=0.75, wpb=193.4, bsz=8, num_updates=1250, lr=8.11966e-06, gnorm=2.662, loss_scale=128, train_wall=33, gb_free=15.7, wall=1806
2023-10-27 21:47:44 | INFO | train_inner | epoch 002: 58 / 1217 loss=2.328, nll_loss=0.098, accuracy=22.5, wps=142.5, ups=0.75, wpb=189.4, bsz=8, num_updates=1275, lr=8.07692e-06, gnorm=2.588, loss_scale=128, train_wall=33, gb_free=15.7, wall=1839
2023-10-27 21:48:18 | INFO | train_inner | epoch 002: 83 / 1217 loss=2.336, nll_loss=0.1, accuracy=20.5, wps=137, ups=0.73, wpb=187.6, bsz=8, num_updates=1300, lr=8.03419e-06, gnorm=2.657, loss_scale=128, train_wall=34, gb_free=15.5, wall=1873
2023-10-27 21:48:52 | INFO | train_inner | epoch 002: 108 / 1217 loss=2.313, nll_loss=0.096, accuracy=26, wps=143, ups=0.74, wpb=193.7, bsz=8, num_updates=1325, lr=7.99145e-06, gnorm=2.66, loss_scale=128, train_wall=33, gb_free=15.7, wall=1907
2023-10-27 21:49:30 | INFO | train_inner | epoch 002: 133 / 1217 loss=2.343, nll_loss=0.097, accuracy=15.5, wps=127.8, ups=0.66, wpb=194, bsz=8, num_updates=1350, lr=7.94872e-06, gnorm=2.635, loss_scale=128, train_wall=37, gb_free=15.6, wall=1945
2023-10-27 21:50:05 | INFO | train_inner | epoch 002: 158 / 1217 loss=2.332, nll_loss=0.097, accuracy=22.5, wps=139.2, ups=0.72, wpb=193, bsz=8, num_updates=1375, lr=7.90598e-06, gnorm=2.576, loss_scale=128, train_wall=34, gb_free=15.4, wall=1980
2023-10-27 21:50:40 | INFO | train_inner | epoch 002: 183 / 1217 loss=2.327, nll_loss=0.095, accuracy=22.5, wps=137, ups=0.7, wpb=196.3, bsz=8, num_updates=1400, lr=7.86325e-06, gnorm=2.538, loss_scale=128, train_wall=35, gb_free=15.2, wall=2016
2023-10-27 21:51:19 | INFO | train_inner | epoch 002: 208 / 1217 loss=2.309, nll_loss=0.097, accuracy=21.5, wps=123.7, ups=0.65, wpb=189.8, bsz=8, num_updates=1425, lr=7.82051e-06, gnorm=2.542, loss_scale=128, train_wall=38, gb_free=15.7, wall=2054
2023-10-27 21:51:53 | INFO | train_inner | epoch 002: 233 / 1217 loss=2.324, nll_loss=0.1, accuracy=20.5, wps=137, ups=0.74, wpb=186.2, bsz=8, num_updates=1450, lr=7.77778e-06, gnorm=2.572, loss_scale=128, train_wall=34, gb_free=15.7, wall=2088
2023-10-27 21:52:24 | INFO | train_inner | epoch 002: 258 / 1217 loss=2.327, nll_loss=0.096, accuracy=20, wps=154.3, ups=0.79, wpb=194.9, bsz=8, num_updates=1475, lr=7.73504e-06, gnorm=2.59, loss_scale=128, train_wall=31, gb_free=15.7, wall=2120
2023-10-27 21:52:58 | INFO | train_inner | epoch 002: 283 / 1217 loss=2.331, nll_loss=0.098, accuracy=15.5, wps=143.3, ups=0.75, wpb=190.8, bsz=8, num_updates=1500, lr=7.69231e-06, gnorm=2.564, loss_scale=128, train_wall=33, gb_free=15.7, wall=2153
2023-10-27 21:53:32 | INFO | train_inner | epoch 002: 308 / 1217 loss=2.328, nll_loss=0.093, accuracy=22, wps=145.3, ups=0.73, wpb=199.7, bsz=8, num_updates=1525, lr=7.64957e-06, gnorm=2.568, loss_scale=128, train_wall=34, gb_free=15.7, wall=2187
2023-10-27 21:54:05 | INFO | train_inner | epoch 002: 333 / 1217 loss=2.321, nll_loss=0.098, accuracy=19, wps=144, ups=0.76, wpb=188.8, bsz=8, num_updates=1550, lr=7.60684e-06, gnorm=2.606, loss_scale=128, train_wall=32, gb_free=15.7, wall=2220
2023-10-27 21:54:38 | INFO | train_inner | epoch 002: 358 / 1217 loss=2.328, nll_loss=0.097, accuracy=25.5, wps=142.5, ups=0.75, wpb=191.1, bsz=8, num_updates=1575, lr=7.5641e-06, gnorm=2.685, loss_scale=128, train_wall=33, gb_free=15.2, wall=2254
2023-10-27 21:55:10 | INFO | train_inner | epoch 002: 383 / 1217 loss=2.336, nll_loss=0.099, accuracy=20.5, wps=151, ups=0.8, wpb=188.7, bsz=8, num_updates=1600, lr=7.52137e-06, gnorm=2.755, loss_scale=128, train_wall=31, gb_free=15.7, wall=2285
2023-10-27 21:55:42 | INFO | train_inner | epoch 002: 408 / 1217 loss=2.325, nll_loss=0.098, accuracy=20.5, wps=145.9, ups=0.77, wpb=190.6, bsz=8, num_updates=1625, lr=7.47863e-06, gnorm=2.694, loss_scale=128, train_wall=32, gb_free=15.7, wall=2317
2023-10-27 21:56:15 | INFO | train_inner | epoch 002: 433 / 1217 loss=2.322, nll_loss=0.095, accuracy=19, wps=148.9, ups=0.76, wpb=194.9, bsz=8, num_updates=1650, lr=7.4359e-06, gnorm=2.713, loss_scale=128, train_wall=32, gb_free=15.4, wall=2350
2023-10-27 21:56:48 | INFO | train_inner | epoch 002: 458 / 1217 loss=2.328, nll_loss=0.094, accuracy=18, wps=150.7, ups=0.76, wpb=197.3, bsz=8, num_updates=1675, lr=7.39316e-06, gnorm=2.65, loss_scale=128, train_wall=32, gb_free=15.3, wall=2383
2023-10-27 21:57:21 | INFO | train_inner | epoch 002: 483 / 1217 loss=2.319, nll_loss=0.096, accuracy=16, wps=143.4, ups=0.74, wpb=193.3, bsz=8, num_updates=1700, lr=7.35043e-06, gnorm=2.613, loss_scale=128, train_wall=33, gb_free=15.7, wall=2417
2023-10-27 21:57:55 | INFO | train_inner | epoch 002: 508 / 1217 loss=2.349, nll_loss=0.093, accuracy=15.5, wps=152.3, ups=0.75, wpb=202.3, bsz=8, num_updates=1725, lr=7.30769e-06, gnorm=2.75, loss_scale=128, train_wall=33, gb_free=15.6, wall=2450
2023-10-27 21:58:27 | INFO | train_inner | epoch 002: 533 / 1217 loss=2.317, nll_loss=0.099, accuracy=23, wps=144.8, ups=0.77, wpb=188.1, bsz=8, num_updates=1750, lr=7.26496e-06, gnorm=3.164, loss_scale=128, train_wall=32, gb_free=15.7, wall=2482
2023-10-27 21:59:00 | INFO | train_inner | epoch 002: 558 / 1217 loss=2.336, nll_loss=0.098, accuracy=21.5, wps=143.6, ups=0.75, wpb=190.5, bsz=8, num_updates=1775, lr=7.22222e-06, gnorm=2.83, loss_scale=128, train_wall=33, gb_free=15.5, wall=2515
2023-10-27 21:59:35 | INFO | train_inner | epoch 002: 583 / 1217 loss=2.335, nll_loss=0.101, accuracy=18.5, wps=133.2, ups=0.72, wpb=185.5, bsz=8, num_updates=1800, lr=7.17949e-06, gnorm=2.794, loss_scale=128, train_wall=34, gb_free=15.7, wall=2550
2023-10-27 22:00:09 | INFO | train_inner | epoch 002: 608 / 1217 loss=2.333, nll_loss=0.097, accuracy=14.5, wps=140.5, ups=0.73, wpb=192.4, bsz=8, num_updates=1825, lr=7.13675e-06, gnorm=2.895, loss_scale=128, train_wall=34, gb_free=15.7, wall=2585
2023-10-27 22:00:43 | INFO | train_inner | epoch 002: 633 / 1217 loss=2.33, nll_loss=0.096, accuracy=17.5, wps=143.5, ups=0.74, wpb=194.2, bsz=8, num_updates=1850, lr=7.09402e-06, gnorm=3.265, loss_scale=128, train_wall=33, gb_free=15.7, wall=2618
2023-10-27 22:01:17 | INFO | train_inner | epoch 002: 658 / 1217 loss=2.331, nll_loss=0.096, accuracy=18.5, wps=145.7, ups=0.75, wpb=195.2, bsz=8, num_updates=1875, lr=7.05128e-06, gnorm=2.786, loss_scale=128, train_wall=33, gb_free=15.7, wall=2652
2023-10-27 22:01:51 | INFO | train_inner | epoch 002: 683 / 1217 loss=2.325, nll_loss=0.097, accuracy=21, wps=138.3, ups=0.72, wpb=192, bsz=8, num_updates=1900, lr=7.00855e-06, gnorm=2.908, loss_scale=128, train_wall=34, gb_free=15.7, wall=2687
2023-10-27 22:02:24 | INFO | train_inner | epoch 002: 708 / 1217 loss=2.322, nll_loss=0.096, accuracy=20.5, wps=148.3, ups=0.77, wpb=193.2, bsz=8, num_updates=1925, lr=6.96581e-06, gnorm=3.026, loss_scale=128, train_wall=32, gb_free=15.7, wall=2719
2023-10-27 22:02:57 | INFO | train_inner | epoch 002: 733 / 1217 loss=2.326, nll_loss=0.098, accuracy=21, wps=143.2, ups=0.75, wpb=190, bsz=8, num_updates=1950, lr=6.92308e-06, gnorm=3.382, loss_scale=128, train_wall=33, gb_free=15.7, wall=2752
2023-10-27 22:03:31 | INFO | train_inner | epoch 002: 758 / 1217 loss=2.335, nll_loss=0.096, accuracy=18.5, wps=143.1, ups=0.74, wpb=193.6, bsz=8, num_updates=1975, lr=6.88034e-06, gnorm=3.273, loss_scale=128, train_wall=33, gb_free=15.7, wall=2786
2023-10-27 22:04:11 | INFO | train_inner | epoch 002: 783 / 1217 loss=2.316, nll_loss=0.098, accuracy=21, wps=116.6, ups=0.62, wpb=188.5, bsz=8, num_updates=2000, lr=6.83761e-06, gnorm=3.013, loss_scale=128, train_wall=40, gb_free=15.7, wall=2827
2023-10-27 22:04:45 | INFO | train_inner | epoch 002: 808 / 1217 loss=2.314, nll_loss=0.096, accuracy=23, wps=143.4, ups=0.75, wpb=192.3, bsz=8, num_updates=2025, lr=6.79487e-06, gnorm=2.988, loss_scale=128, train_wall=33, gb_free=15.7, wall=2860
2023-10-27 22:05:18 | INFO | train_inner | epoch 002: 833 / 1217 loss=2.347, nll_loss=0.097, accuracy=14, wps=148.7, ups=0.77, wpb=193.7, bsz=8, num_updates=2050, lr=6.75214e-06, gnorm=3.376, loss_scale=128, train_wall=32, gb_free=15.7, wall=2893
2023-10-27 22:05:54 | INFO | train_inner | epoch 002: 858 / 1217 loss=2.328, nll_loss=0.096, accuracy=21.5, wps=131.6, ups=0.68, wpb=193.1, bsz=8, num_updates=2075, lr=6.7094e-06, gnorm=3.317, loss_scale=128, train_wall=36, gb_free=15.7, wall=2929
2023-10-27 22:06:26 | INFO | train_inner | epoch 002: 883 / 1217 loss=2.33, nll_loss=0.095, accuracy=21, wps=153.2, ups=0.78, wpb=196.8, bsz=8, num_updates=2100, lr=6.66667e-06, gnorm=3.072, loss_scale=128, train_wall=32, gb_free=15.6, wall=2961
2023-10-27 22:07:01 | INFO | train_inner | epoch 002: 908 / 1217 loss=2.326, nll_loss=0.093, accuracy=20.5, wps=142.9, ups=0.72, wpb=199.6, bsz=8, num_updates=2125, lr=6.62393e-06, gnorm=2.949, loss_scale=128, train_wall=35, gb_free=15.6, wall=2996
2023-10-27 22:07:35 | INFO | train_inner | epoch 002: 933 / 1217 loss=2.322, nll_loss=0.094, accuracy=21, wps=147.6, ups=0.75, wpb=197.7, bsz=8, num_updates=2150, lr=6.5812e-06, gnorm=3.133, loss_scale=128, train_wall=33, gb_free=15.7, wall=3030
2023-10-27 22:08:09 | INFO | train_inner | epoch 002: 958 / 1217 loss=2.328, nll_loss=0.093, accuracy=20, wps=144.4, ups=0.72, wpb=199.9, bsz=8, num_updates=2175, lr=6.53846e-06, gnorm=3.201, loss_scale=128, train_wall=34, gb_free=15.7, wall=3065
2023-10-27 22:08:43 | INFO | train_inner | epoch 002: 983 / 1217 loss=2.321, nll_loss=0.097, accuracy=23, wps=140.5, ups=0.74, wpb=190.9, bsz=8, num_updates=2200, lr=6.49573e-06, gnorm=3.08, loss_scale=128, train_wall=34, gb_free=15.7, wall=3098
2023-10-27 22:09:16 | INFO | train_inner | epoch 002: 1008 / 1217 loss=2.331, nll_loss=0.097, accuracy=17.5, wps=147, ups=0.77, wpb=191.4, bsz=8, num_updates=2225, lr=6.45299e-06, gnorm=2.92, loss_scale=128, train_wall=32, gb_free=15.3, wall=3131
2023-10-27 22:09:49 | INFO | train_inner | epoch 002: 1033 / 1217 loss=2.324, nll_loss=0.096, accuracy=19.5, wps=148.8, ups=0.77, wpb=194.5, bsz=8, num_updates=2250, lr=6.41026e-06, gnorm=2.963, loss_scale=128, train_wall=32, gb_free=15.7, wall=3164
2023-10-27 22:10:21 | INFO | train_inner | epoch 002: 1058 / 1217 loss=2.316, nll_loss=0.096, accuracy=25, wps=147.8, ups=0.77, wpb=193.1, bsz=8, num_updates=2275, lr=6.36752e-06, gnorm=2.86, loss_scale=128, train_wall=32, gb_free=15.7, wall=3196
2023-10-27 22:10:54 | INFO | train_inner | epoch 002: 1083 / 1217 loss=2.312, nll_loss=0.098, accuracy=27, wps=142.5, ups=0.75, wpb=189.4, bsz=8, num_updates=2300, lr=6.32479e-06, gnorm=2.9, loss_scale=128, train_wall=33, gb_free=15.7, wall=3230
2023-10-27 22:11:26 | INFO | train_inner | epoch 002: 1108 / 1217 loss=2.327, nll_loss=0.097, accuracy=22, wps=150.8, ups=0.79, wpb=191.5, bsz=8, num_updates=2325, lr=6.28205e-06, gnorm=2.962, loss_scale=128, train_wall=31, gb_free=15.7, wall=3261
2023-10-27 22:12:03 | INFO | train_inner | epoch 002: 1133 / 1217 loss=2.327, nll_loss=0.099, accuracy=20, wps=128.5, ups=0.68, wpb=188.2, bsz=8, num_updates=2350, lr=6.23932e-06, gnorm=2.935, loss_scale=128, train_wall=36, gb_free=15.7, wall=3298
2023-10-27 22:12:39 | INFO | train_inner | epoch 002: 1158 / 1217 loss=2.313, nll_loss=0.092, accuracy=21.5, wps=138, ups=0.69, wpb=201.3, bsz=8, num_updates=2375, lr=6.19658e-06, gnorm=3.045, loss_scale=128, train_wall=36, gb_free=15.7, wall=3334
2023-10-27 22:13:15 | INFO | train_inner | epoch 002: 1183 / 1217 loss=2.338, nll_loss=0.095, accuracy=15, wps=137.5, ups=0.7, wpb=197, bsz=8, num_updates=2400, lr=6.15385e-06, gnorm=3.044, loss_scale=128, train_wall=35, gb_free=15.7, wall=3370
2023-10-27 22:13:46 | INFO | train_inner | epoch 002: 1208 / 1217 loss=2.333, nll_loss=0.096, accuracy=18, wps=157.3, ups=0.81, wpb=195.1, bsz=8, num_updates=2425, lr=6.11111e-06, gnorm=3.039, loss_scale=128, train_wall=31, gb_free=15.7, wall=3401
2023-10-27 22:13:59 | INFO | fairseq_cli.train | begin validation on "valid" subset
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 22:15:17 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 2.322 | nll_loss 0.097 | accuracy 18.5 | wps 377.3 | wpb 191.6 | bsz 8 | num_updates 2434 | best_accuracy 20.4
2023-10-27 22:15:17 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 2 @ 2434 updates
2023-10-27 22:15:17 | INFO | fairseq_cli.train | end of epoch 2 (average epoch stats below)
2023-10-27 22:15:17 | INFO | train | epoch 002 | loss 2.327 | nll_loss 0.096 | accuracy 20.1 | wps 135.6 | ups 0.7 | wpb 193 | bsz 8 | num_updates 2434 | lr 6.09573e-06 | gnorm 2.87 | loss_scale 128 | train_wall 1633 | gb_free 15.7 | wall 3492
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 3
2023-10-27 22:15:17 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 22:15:17 | INFO | fairseq.trainer | begin training epoch 3
2023-10-27 22:15:17 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 22:15:43 | INFO | train_inner | epoch 003: 16 / 1217 loss=2.32, nll_loss=0.096, accuracy=19.5, wps=41.3, ups=0.21, wpb=193, bsz=8, num_updates=2450, lr=6.06838e-06, gnorm=3.035, loss_scale=128, train_wall=38, gb_free=15.7, wall=3518
2023-10-27 22:16:17 | INFO | train_inner | epoch 003: 41 / 1217 loss=2.307, nll_loss=0.094, accuracy=19, wps=143, ups=0.73, wpb=196.2, bsz=8, num_updates=2475, lr=6.02564e-06, gnorm=2.932, loss_scale=128, train_wall=34, gb_free=15.7, wall=3552
2023-10-27 22:16:44 | INFO | train_inner | epoch 003: 66 / 1217 loss=2.333, nll_loss=0.098, accuracy=20.5, wps=175.4, ups=0.92, wpb=190.2, bsz=8, num_updates=2500, lr=5.98291e-06, gnorm=2.979, loss_scale=128, train_wall=27, gb_free=15.7, wall=3580
2023-10-27 22:17:01 | INFO | train_inner | epoch 003: 91 / 1217 loss=2.325, nll_loss=0.098, accuracy=18.5, wps=289.8, ups=1.53, wpb=189.8, bsz=8, num_updates=2525, lr=5.94017e-06, gnorm=2.91, loss_scale=128, train_wall=16, gb_free=15.7, wall=3596
2023-10-27 22:17:38 | INFO | train_inner | epoch 003: 116 / 1217 loss=2.32, nll_loss=0.095, accuracy=19.5, wps=131.8, ups=0.67, wpb=195.9, bsz=8, num_updates=2550, lr=5.89744e-06, gnorm=2.873, loss_scale=128, train_wall=37, gb_free=15.4, wall=3633
2023-10-27 22:18:13 | INFO | train_inner | epoch 003: 141 / 1217 loss=2.336, nll_loss=0.098, accuracy=17, wps=137.5, ups=0.72, wpb=191.3, bsz=8, num_updates=2575, lr=5.8547e-06, gnorm=2.88, loss_scale=128, train_wall=34, gb_free=15.6, wall=3668
2023-10-27 22:18:47 | INFO | train_inner | epoch 003: 166 / 1217 loss=2.322, nll_loss=0.095, accuracy=22.5, wps=140.9, ups=0.72, wpb=194.8, bsz=8, num_updates=2600, lr=5.81197e-06, gnorm=2.834, loss_scale=128, train_wall=34, gb_free=15.7, wall=3703
2023-10-27 22:19:09 | INFO | train_inner | epoch 003: 191 / 1217 loss=2.318, nll_loss=0.098, accuracy=20.5, wps=216.1, ups=1.15, wpb=188.6, bsz=8, num_updates=2625, lr=5.76923e-06, gnorm=2.909, loss_scale=128, train_wall=22, gb_free=15.4, wall=3724
2023-10-27 22:19:43 | INFO | train_inner | epoch 003: 216 / 1217 loss=2.329, nll_loss=0.094, accuracy=20, wps=144.6, ups=0.73, wpb=197.3, bsz=8, num_updates=2650, lr=5.7265e-06, gnorm=2.928, loss_scale=128, train_wall=34, gb_free=15.7, wall=3758
2023-10-27 22:20:19 | INFO | train_inner | epoch 003: 241 / 1217 loss=2.334, nll_loss=0.096, accuracy=20, wps=136, ups=0.7, wpb=193.8, bsz=8, num_updates=2675, lr=5.68376e-06, gnorm=2.895, loss_scale=128, train_wall=35, gb_free=15.7, wall=3794
2023-10-27 22:20:53 | INFO | train_inner | epoch 003: 266 / 1217 loss=2.317, nll_loss=0.094, accuracy=23, wps=145.2, ups=0.73, wpb=198.2, bsz=8, num_updates=2700, lr=5.64103e-06, gnorm=2.855, loss_scale=128, train_wall=34, gb_free=15.7, wall=3828
2023-10-27 22:21:27 | INFO | train_inner | epoch 003: 291 / 1217 loss=2.32, nll_loss=0.096, accuracy=25, wps=142.2, ups=0.74, wpb=192.6, bsz=8, num_updates=2725, lr=5.59829e-06, gnorm=2.838, loss_scale=128, train_wall=33, gb_free=15.7, wall=3862
2023-10-27 22:22:01 | INFO | train_inner | epoch 003: 316 / 1217 loss=2.326, nll_loss=0.095, accuracy=20.5, wps=143.7, ups=0.74, wpb=195.5, bsz=8, num_updates=2750, lr=5.55556e-06, gnorm=2.839, loss_scale=128, train_wall=34, gb_free=15.7, wall=3896
2023-10-27 22:22:34 | INFO | train_inner | epoch 003: 341 / 1217 loss=2.339, nll_loss=0.095, accuracy=12.5, wps=149.8, ups=0.76, wpb=197.8, bsz=8, num_updates=2775, lr=5.51282e-06, gnorm=2.837, loss_scale=128, train_wall=33, gb_free=15.7, wall=3929
2023-10-27 22:23:06 | INFO | train_inner | epoch 003: 366 / 1217 loss=2.33, nll_loss=0.095, accuracy=18.5, wps=151.4, ups=0.77, wpb=195.6, bsz=8, num_updates=2800, lr=5.47009e-06, gnorm=2.781, loss_scale=128, train_wall=32, gb_free=15.4, wall=3961
2023-10-27 22:23:40 | INFO | train_inner | epoch 003: 391 / 1217 loss=2.328, nll_loss=0.097, accuracy=22.5, wps=141.2, ups=0.73, wpb=193, bsz=8, num_updates=2825, lr=5.42735e-06, gnorm=2.726, loss_scale=128, train_wall=34, gb_free=15.7, wall=3996
2023-10-27 22:24:17 | INFO | train_inner | epoch 003: 416 / 1217 loss=2.315, nll_loss=0.098, accuracy=22, wps=130.1, ups=0.69, wpb=188.1, bsz=8, num_updates=2850, lr=5.38462e-06, gnorm=2.714, loss_scale=128, train_wall=36, gb_free=15.7, wall=4032
2023-10-27 22:24:52 | INFO | train_inner | epoch 003: 441 / 1217 loss=2.32, nll_loss=0.094, accuracy=22, wps=140.6, ups=0.71, wpb=197.4, bsz=8, num_updates=2875, lr=5.34188e-06, gnorm=2.741, loss_scale=128, train_wall=35, gb_free=15.7, wall=4067
2023-10-27 22:25:27 | INFO | train_inner | epoch 003: 466 / 1217 loss=2.32, nll_loss=0.097, accuracy=25, wps=137.2, ups=0.71, wpb=192, bsz=8, num_updates=2900, lr=5.29915e-06, gnorm=2.737, loss_scale=128, train_wall=35, gb_free=15.7, wall=4102
2023-10-27 22:26:00 | INFO | train_inner | epoch 003: 491 / 1217 loss=2.332, nll_loss=0.097, accuracy=19, wps=144.3, ups=0.75, wpb=193.2, bsz=8, num_updates=2925, lr=5.25641e-06, gnorm=2.751, loss_scale=128, train_wall=33, gb_free=15.7, wall=4135
2023-10-27 22:26:33 | INFO | train_inner | epoch 003: 516 / 1217 loss=2.329, nll_loss=0.097, accuracy=20.5, wps=145.2, ups=0.75, wpb=192.8, bsz=8, num_updates=2950, lr=5.21368e-06, gnorm=2.703, loss_scale=128, train_wall=33, gb_free=15.7, wall=4168
2023-10-27 22:27:08 | INFO | train_inner | epoch 003: 541 / 1217 loss=2.315, nll_loss=0.099, accuracy=21.5, wps=133, ups=0.71, wpb=186.8, bsz=8, num_updates=2975, lr=5.17094e-06, gnorm=2.657, loss_scale=128, train_wall=35, gb_free=15.7, wall=4204
2023-10-27 22:27:43 | INFO | train_inner | epoch 003: 566 / 1217 loss=2.336, nll_loss=0.098, accuracy=15.5, wps=140.3, ups=0.73, wpb=191.4, bsz=8, num_updates=3000, lr=5.12821e-06, gnorm=2.685, loss_scale=128, train_wall=34, gb_free=15.1, wall=4238
2023-10-27 22:28:15 | INFO | train_inner | epoch 003: 591 / 1217 loss=2.331, nll_loss=0.099, accuracy=16.5, wps=144, ups=0.77, wpb=188.2, bsz=8, num_updates=3025, lr=5.08547e-06, gnorm=2.717, loss_scale=128, train_wall=32, gb_free=15.7, wall=4270
2023-10-27 22:28:47 | INFO | train_inner | epoch 003: 616 / 1217 loss=2.335, nll_loss=0.098, accuracy=19.5, wps=148.7, ups=0.78, wpb=191.4, bsz=8, num_updates=3050, lr=5.04274e-06, gnorm=2.744, loss_scale=128, train_wall=32, gb_free=15.7, wall=4303
2023-10-27 22:29:23 | INFO | train_inner | epoch 003: 641 / 1217 loss=2.346, nll_loss=0.096, accuracy=18, wps=139.6, ups=0.71, wpb=196.1, bsz=8, num_updates=3075, lr=5e-06, gnorm=2.69, loss_scale=128, train_wall=35, gb_free=15.2, wall=4338
2023-10-27 22:30:00 | INFO | train_inner | epoch 003: 666 / 1217 loss=2.329, nll_loss=0.098, accuracy=16.5, wps=128, ups=0.68, wpb=189.2, bsz=8, num_updates=3100, lr=4.95726e-06, gnorm=2.622, loss_scale=128, train_wall=37, gb_free=15.7, wall=4375
2023-10-27 22:30:34 | INFO | train_inner | epoch 003: 691 / 1217 loss=2.325, nll_loss=0.097, accuracy=18.5, wps=138.1, ups=0.72, wpb=191.9, bsz=8, num_updates=3125, lr=4.91453e-06, gnorm=2.56, loss_scale=128, train_wall=34, gb_free=15.7, wall=4409
2023-10-27 22:31:09 | INFO | train_inner | epoch 003: 716 / 1217 loss=2.319, nll_loss=0.095, accuracy=20.5, wps=141.8, ups=0.73, wpb=195, bsz=8, num_updates=3150, lr=4.87179e-06, gnorm=2.554, loss_scale=128, train_wall=34, gb_free=15.7, wall=4444
2023-10-27 22:31:44 | INFO | train_inner | epoch 003: 741 / 1217 loss=2.318, nll_loss=0.096, accuracy=23.5, wps=137.5, ups=0.72, wpb=192.3, bsz=8, num_updates=3175, lr=4.82906e-06, gnorm=2.571, loss_scale=128, train_wall=35, gb_free=15.1, wall=4479
2023-10-27 22:32:17 | INFO | train_inner | epoch 003: 766 / 1217 loss=2.324, nll_loss=0.097, accuracy=19, wps=141, ups=0.74, wpb=190.8, bsz=8, num_updates=3200, lr=4.78632e-06, gnorm=2.602, loss_scale=128, train_wall=33, gb_free=15.7, wall=4513
2023-10-27 22:32:52 | INFO | train_inner | epoch 003: 791 / 1217 loss=2.327, nll_loss=0.097, accuracy=19, wps=137.8, ups=0.72, wpb=191.4, bsz=8, num_updates=3225, lr=4.74359e-06, gnorm=2.581, loss_scale=128, train_wall=34, gb_free=15.7, wall=4547
2023-10-27 22:33:26 | INFO | train_inner | epoch 003: 816 / 1217 loss=2.324, nll_loss=0.096, accuracy=21, wps=142.5, ups=0.74, wpb=193.4, bsz=8, num_updates=3250, lr=4.70085e-06, gnorm=2.571, loss_scale=128, train_wall=34, gb_free=15.7, wall=4581
2023-10-27 22:34:03 | INFO | train_inner | epoch 003: 841 / 1217 loss=2.323, nll_loss=0.096, accuracy=24, wps=130.7, ups=0.68, wpb=192.6, bsz=8, num_updates=3275, lr=4.65812e-06, gnorm=2.583, loss_scale=128, train_wall=36, gb_free=15.7, wall=4618
2023-10-27 22:34:38 | INFO | train_inner | epoch 003: 866 / 1217 loss=2.332, nll_loss=0.096, accuracy=19, wps=139.2, ups=0.72, wpb=193.9, bsz=8, num_updates=3300, lr=4.61538e-06, gnorm=2.62, loss_scale=128, train_wall=34, gb_free=15.7, wall=4653
2023-10-27 22:35:13 | INFO | train_inner | epoch 003: 891 / 1217 loss=2.318, nll_loss=0.095, accuracy=20, wps=138.5, ups=0.71, wpb=194.4, bsz=8, num_updates=3325, lr=4.57265e-06, gnorm=2.57, loss_scale=128, train_wall=35, gb_free=15.7, wall=4688
2023-10-27 22:35:46 | INFO | train_inner | epoch 003: 916 / 1217 loss=2.317, nll_loss=0.097, accuracy=26.5, wps=142.9, ups=0.75, wpb=191, bsz=8, num_updates=3350, lr=4.52991e-06, gnorm=2.593, loss_scale=128, train_wall=33, gb_free=15.7, wall=4721
2023-10-27 22:36:21 | INFO | train_inner | epoch 003: 941 / 1217 loss=2.325, nll_loss=0.094, accuracy=21.5, wps=142.2, ups=0.72, wpb=197.6, bsz=8, num_updates=3375, lr=4.48718e-06, gnorm=2.569, loss_scale=128, train_wall=34, gb_free=15.7, wall=4756
2023-10-27 22:36:55 | INFO | train_inner | epoch 003: 966 / 1217 loss=2.318, nll_loss=0.096, accuracy=19, wps=142.4, ups=0.74, wpb=192.7, bsz=8, num_updates=3400, lr=4.44444e-06, gnorm=2.571, loss_scale=128, train_wall=33, gb_free=15.7, wall=4790
2023-10-27 22:37:31 | INFO | train_inner | epoch 003: 991 / 1217 loss=2.325, nll_loss=0.096, accuracy=18, wps=136.2, ups=0.7, wpb=194.5, bsz=8, num_updates=3425, lr=4.40171e-06, gnorm=2.548, loss_scale=128, train_wall=35, gb_free=14.6, wall=4826
2023-10-27 22:38:06 | INFO | train_inner | epoch 003: 1016 / 1217 loss=2.338, nll_loss=0.097, accuracy=14.5, wps=134.8, ups=0.7, wpb=193.4, bsz=8, num_updates=3450, lr=4.35897e-06, gnorm=2.494, loss_scale=128, train_wall=36, gb_free=15.7, wall=4862
2023-10-27 22:38:42 | INFO | train_inner | epoch 003: 1041 / 1217 loss=2.323, nll_loss=0.099, accuracy=22.5, wps=133.4, ups=0.71, wpb=188.2, bsz=8, num_updates=3475, lr=4.31624e-06, gnorm=2.454, loss_scale=128, train_wall=35, gb_free=15.7, wall=4897
2023-10-27 22:39:19 | INFO | train_inner | epoch 003: 1066 / 1217 loss=2.332, nll_loss=0.096, accuracy=16, wps=130.6, ups=0.67, wpb=193.8, bsz=8, num_updates=3500, lr=4.2735e-06, gnorm=2.42, loss_scale=128, train_wall=36, gb_free=15.7, wall=4934
2023-10-27 22:39:55 | INFO | train_inner | epoch 003: 1091 / 1217 loss=2.328, nll_loss=0.094, accuracy=17, wps=136.6, ups=0.69, wpb=198.8, bsz=8, num_updates=3525, lr=4.23077e-06, gnorm=2.426, loss_scale=128, train_wall=36, gb_free=15.7, wall=4970
2023-10-27 22:40:32 | INFO | train_inner | epoch 003: 1116 / 1217 loss=2.333, nll_loss=0.099, accuracy=17, wps=126.6, ups=0.67, wpb=188, bsz=8, num_updates=3550, lr=4.18803e-06, gnorm=2.411, loss_scale=128, train_wall=37, gb_free=15.7, wall=5008
2023-10-27 22:41:06 | INFO | train_inner | epoch 003: 1141 / 1217 loss=2.333, nll_loss=0.099, accuracy=20.5, wps=141, ups=0.75, wpb=188.4, bsz=8, num_updates=3575, lr=4.1453e-06, gnorm=2.39, loss_scale=128, train_wall=33, gb_free=15.7, wall=5041
2023-10-27 22:41:42 | INFO | train_inner | epoch 003: 1166 / 1217 loss=2.323, nll_loss=0.095, accuracy=20, wps=136, ups=0.7, wpb=195.3, bsz=8, num_updates=3600, lr=4.10256e-06, gnorm=2.361, loss_scale=128, train_wall=35, gb_free=15.7, wall=5077
2023-10-27 22:42:16 | INFO | train_inner | epoch 003: 1191 / 1217 loss=2.319, nll_loss=0.094, accuracy=20, wps=144.3, ups=0.73, wpb=197.9, bsz=8, num_updates=3625, lr=4.05983e-06, gnorm=2.365, loss_scale=128, train_wall=34, gb_free=15.7, wall=5111
2023-10-27 22:42:50 | INFO | train_inner | epoch 003: 1216 / 1217 loss=2.323, nll_loss=0.096, accuracy=23, wps=140.4, ups=0.73, wpb=192.7, bsz=8, num_updates=3650, lr=4.01709e-06, gnorm=2.347, loss_scale=128, train_wall=34, gb_free=15.7, wall=5145
Environment
- fairseq Version (e.g., 1.0 or main): main
- PyTorch Version (e.g., 1.0): 1.11.0
- OS (e.g., Linux): Linux Ubuntu
- How you installed fairseq (
pip, source): source - Build command you used (if compiling from source): pip install --editable ./
- Python version: 3.9
- CUDA/cuDNN version: 11.1
- GPU models and configuration: GeForce RTX 3090
- Any other relevant information:
# Name Version Build Channel
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
antlr4-python3-runtime 4.8 pypi_0 pypi
bitarray 2.8.2 pypi_0 pypi
blas 1.0 mkl defaults
brotli-python 1.0.9 py39h6a678d5_7 defaults
bzip2 1.0.8 h7b6447c_0 defaults
ca-certificates 2023.08.22 h06a4308_0 defaults
certifi 2023.7.22 py39h06a4308_0 defaults
cffi 1.15.1 py39h5eee18b_3 defaults
charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults
colorama 0.4.6 pypi_0 pypi
cryptography 41.0.3 py39hdda0065_0 defaults
cudatoolkit 11.3.1 h2bc3f7f_2 defaults
cython 3.0.4 pypi_0 pypi
fairseq 0.12.2 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.12.4 pypi_0 pypi
freetype 2.12.1 h4a9f257_0 defaults
fsspec 2023.10.0 pypi_0 pypi
giflib 5.2.1 h5eee18b_3 defaults
gmp 6.2.1 h295c915_3 defaults
gnutls 3.6.15 he1e5248_0 defaults
hydra-core 1.0.7 pypi_0 pypi
idna 3.4 py39h06a4308_0 defaults
intel-openmp 2023.1.0 hdb19cb5_46305 defaults
jinja2 3.1.2 pypi_0 pypi
joblib 1.3.2 pypi_0 pypi
jpeg 9e h5eee18b_1 defaults
lame 3.100 h7b6447c_0 defaults
lcms2 2.12 h3be6417_0 defaults
ld_impl_linux-64 2.38 h1181459_1 defaults
lerc 3.0 h295c915_0 defaults
libdeflate 1.17 h5eee18b_1 defaults
libffi 3.4.4 h6a678d5_0 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libiconv 1.16 h7f8727e_2 defaults
libidn2 2.3.4 h5eee18b_0 defaults
libpng 1.6.39 h5eee18b_0 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
libtasn1 4.19.0 h5eee18b_0 defaults
libtiff 4.5.1 h6a678d5_0 defaults
libunistring 0.9.10 h27cfd23_0 defaults
libuv 1.44.2 h5eee18b_0 defaults
libwebp 1.3.2 h11a3e52_0 defaults
libwebp-base 1.3.2 h5eee18b_0 defaults
lxml 4.9.3 pypi_0 pypi
lz4-c 1.9.4 h6a678d5_0 defaults
markupsafe 2.1.3 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46343 defaults
mkl-service 2.4.0 py39h5eee18b_1 defaults
mkl_fft 1.3.8 py39h5eee18b_0 defaults
mkl_random 1.2.4 py39hdb19cb5_0 defaults
mpmath 1.3.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
nettle 3.7.3 hbbd107a_1 defaults
networkx 3.2 pypi_0 pypi
numpy 1.26.0 py39h5f9d8c6_0 defaults
numpy-base 1.26.0 py39hb5e798b_0 defaults
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.18.1 pypi_0 pypi
nvidia-nvjitlink-cu12 12.3.52 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
omegaconf 2.0.6 pypi_0 pypi
openh264 2.1.1 h4ff587b_0 defaults
openjpeg 2.4.0 h3ad879b_0 defaults
openssl 3.0.11 h7f8727e_2 defaults
packaging 23.2 pypi_0 pypi
pillow 10.0.1 py39ha6cbd5a_0 defaults
pip 23.3 py39h06a4308_0 defaults
portalocker 2.8.2 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0 defaults
pyopenssl 23.2.0 py39h06a4308_0 defaults
pysocks 1.7.1 py39h06a4308_0 defaults
python 3.9.18 h955ad1f_0 defaults
pytorch 1.12.0 py3.9_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
regex 2023.10.3 pypi_0 pypi
requests 2.31.0 py39h06a4308_0 defaults
sacrebleu 2.3.1 pypi_0 pypi
scikit-learn 1.3.2 pypi_0 pypi
scipy 1.11.3 pypi_0 pypi
setuptools 68.0.0 py39h06a4308_0 defaults
sqlite 3.41.2 h5eee18b_0 defaults
sympy 1.12 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
tbb 2021.8.0 hdb19cb5_0 defaults
threadpoolctl 3.2.0 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
torch 2.1.0 pypi_0 pypi
torchaudio 0.12.0 py39_cu113 pytorch
torchvision 0.13.0 py39_cu113 pytorch
tqdm 4.66.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typing_extensions 4.7.1 py39h06a4308_0 defaults
tzdata 2023c h04d1e81_0 defaults
urllib3 1.26.18 py39h06a4308_0 defaults
wheel 0.41.2 py39h06a4308_0 defaults
xz 5.4.2 h5eee18b_0 defaults
zlib 1.2.13 h5eee18b_0 defaults
zstd 1.5.5 hc292b87_0 defaults
I have a similar problem with you. Have you resolved it?
Following #1687, I create totally new conda environment. However, it doesn't work until I download https://github.com/VITA-Group/SMC-Bench and put it in the folder /data/username/work/username/test/SMC-Bench-main. When I move the folder to another path, the training acc goes to 20% again. I think the problem is probably due to path resolution.