fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Training accuracy always being 20% when Finetuning RoBERTa on Commonsense QA

Open Gsruhj opened this issue 2 years ago • 2 comments

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Run finetune command in examples/roberta/commonsense_qa/README.md

The training acc is always about 20%, even after many many steps.

2023-10-27 21:16:54 | INFO | fairseq_cli.train | task: CommonsenseQATask
2023-10-27 21:16:54 | INFO | fairseq_cli.train | model: RobertaModel
2023-10-27 21:16:54 | INFO | fairseq_cli.train | criterion: SentenceRankingCriterion
2023-10-27 21:16:54 | INFO | fairseq_cli.train | num. shared model params: 356,461,658 (num. trained: 356,461,658)
2023-10-27 21:16:54 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
| Loaded valid with 1221 samples
2023-10-27 21:17:05 | INFO | fairseq.trainer | detected shared parameter: encoder.sentence_encoder.embed_tokens.weight <- encoder.lm_head.weight
2023-10-27 21:17:05 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2023-10-27 21:17:05 | INFO | fairseq.utils | rank   0: capabilities =  8.6  ; total memory = 23.700 GB ; name = GeForce RTX 3090                        
2023-10-27 21:17:05 | INFO | fairseq.utils | ***********************CUDA enviroments for all 1 workers***********************
2023-10-27 21:17:05 | INFO | fairseq_cli.train | training on 1 devices (GPUs/TPUs)
2023-10-27 21:17:05 | INFO | fairseq_cli.train | max tokens per device = None and max sentences per device = 8
2023-10-27 21:17:05 | INFO | fairseq.trainer | Preparing to load checkpoint /examples/roberta/roberta.large/model.pt
2023-10-27 21:17:05 | INFO | fairseq.trainer | No existing checkpoint found /examples/roberta/roberta.large/model.pt
2023-10-27 21:17:05 | INFO | fairseq.trainer | loading train data for epoch 1
| Loaded train with 9741 samples
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:17:28 | INFO | fairseq_cli.train | begin dry-run validation on "valid" subset
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:17:28 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:17:30 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 21:17:30 | INFO | fairseq.trainer | begin training epoch 1
2023-10-27 21:17:30 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 21:18:17 | INFO | train_inner | epoch 001:     25 / 1217 loss=2.345, nll_loss=0.099, accuracy=14.5, wps=132, ups=0.7, wpb=190.1, bsz=8, num_updates=25, lr=1.66667e-06, gnorm=11.976, loss_scale=128, train_wall=46, gb_free=15.6, wall=72
2023-10-27 21:18:53 | INFO | train_inner | epoch 001:     50 / 1217 loss=2.33, nll_loss=0.096, accuracy=18, wps=131.6, ups=0.68, wpb=193.4, bsz=8, num_updates=50, lr=3.33333e-06, gnorm=11.059, loss_scale=128, train_wall=36, gb_free=15.1, wall=109
2023-10-27 21:19:28 | INFO | train_inner | epoch 001:     75 / 1217 loss=2.348, nll_loss=0.099, accuracy=16.5, wps=139, ups=0.73, wpb=190.6, bsz=8, num_updates=75, lr=5e-06, gnorm=9.757, loss_scale=128, train_wall=34, gb_free=15.7, wall=143
2023-10-27 21:20:01 | INFO | train_inner | epoch 001:    100 / 1217 loss=2.33, nll_loss=0.095, accuracy=16, wps=146.9, ups=0.75, wpb=196.5, bsz=8, num_updates=100, lr=6.66667e-06, gnorm=7.484, loss_scale=128, train_wall=33, gb_free=15.7, wall=176
2023-10-27 21:20:38 | INFO | train_inner | epoch 001:    125 / 1217 loss=2.326, nll_loss=0.097, accuracy=23.5, wps=129.2, ups=0.67, wpb=192.3, bsz=8, num_updates=125, lr=8.33333e-06, gnorm=5.903, loss_scale=128, train_wall=37, gb_free=15.7, wall=214
2023-10-27 21:21:14 | INFO | train_inner | epoch 001:    150 / 1217 loss=2.323, nll_loss=0.096, accuracy=19, wps=136.2, ups=0.7, wpb=194.2, bsz=8, num_updates=150, lr=1e-05, gnorm=5.444, loss_scale=128, train_wall=35, gb_free=15.7, wall=249
2023-10-27 21:21:49 | INFO | train_inner | epoch 001:    175 / 1217 loss=2.324, nll_loss=0.096, accuracy=17.5, wps=138.7, ups=0.72, wpb=192.9, bsz=8, num_updates=175, lr=9.95726e-06, gnorm=4.921, loss_scale=128, train_wall=34, gb_free=15.7, wall=284
2023-10-27 21:22:05 | INFO | train_inner | epoch 001:    200 / 1217 loss=2.342, nll_loss=0.097, accuracy=17.5, wps=307.1, ups=1.59, wpb=192.6, bsz=8, num_updates=200, lr=9.91453e-06, gnorm=4.516, loss_scale=128, train_wall=16, gb_free=15.7, wall=300
2023-10-27 21:22:35 | INFO | train_inner | epoch 001:    225 / 1217 loss=2.329, nll_loss=0.099, accuracy=19.5, wps=157.5, ups=0.83, wpb=188.8, bsz=8, num_updates=225, lr=9.87179e-06, gnorm=4.041, loss_scale=128, train_wall=30, gb_free=15.5, wall=330
2023-10-27 21:23:09 | INFO | train_inner | epoch 001:    250 / 1217 loss=2.313, nll_loss=0.096, accuracy=25, wps=138.9, ups=0.72, wpb=193.4, bsz=8, num_updates=250, lr=9.82906e-06, gnorm=4.022, loss_scale=128, train_wall=34, gb_free=15.7, wall=364
2023-10-27 21:23:44 | INFO | train_inner | epoch 001:    275 / 1217 loss=2.341, nll_loss=0.097, accuracy=18, wps=141.1, ups=0.73, wpb=192.9, bsz=8, num_updates=275, lr=9.78632e-06, gnorm=3.977, loss_scale=128, train_wall=34, gb_free=15.7, wall=399
2023-10-27 21:24:17 | INFO | train_inner | epoch 001:    300 / 1217 loss=2.321, nll_loss=0.097, accuracy=24, wps=142.6, ups=0.75, wpb=190.7, bsz=8, num_updates=300, lr=9.74359e-06, gnorm=3.642, loss_scale=128, train_wall=33, gb_free=15.7, wall=432
2023-10-27 21:24:51 | INFO | train_inner | epoch 001:    325 / 1217 loss=2.322, nll_loss=0.099, accuracy=23.5, wps=138.2, ups=0.74, wpb=186.8, bsz=8, num_updates=325, lr=9.70085e-06, gnorm=3.695, loss_scale=128, train_wall=33, gb_free=15.7, wall=466
2023-10-27 21:25:23 | INFO | train_inner | epoch 001:    350 / 1217 loss=2.328, nll_loss=0.098, accuracy=24.5, wps=149.4, ups=0.79, wpb=190.2, bsz=8, num_updates=350, lr=9.65812e-06, gnorm=3.775, loss_scale=128, train_wall=31, gb_free=15.5, wall=498
2023-10-27 21:25:56 | INFO | train_inner | epoch 001:    375 / 1217 loss=2.34, nll_loss=0.099, accuracy=14.5, wps=140.7, ups=0.74, wpb=189.6, bsz=8, num_updates=375, lr=9.61538e-06, gnorm=3.727, loss_scale=128, train_wall=33, gb_free=15.7, wall=531
2023-10-27 21:26:30 | INFO | train_inner | epoch 001:    400 / 1217 loss=2.32, nll_loss=0.097, accuracy=24, wps=141.5, ups=0.74, wpb=190.5, bsz=8, num_updates=400, lr=9.57265e-06, gnorm=3.788, loss_scale=128, train_wall=33, gb_free=15.7, wall=565
2023-10-27 21:27:03 | INFO | train_inner | epoch 001:    425 / 1217 loss=2.331, nll_loss=0.096, accuracy=21, wps=145.5, ups=0.75, wpb=193.7, bsz=8, num_updates=425, lr=9.52991e-06, gnorm=3.777, loss_scale=128, train_wall=33, gb_free=15.4, wall=598
2023-10-27 21:27:37 | INFO | train_inner | epoch 001:    450 / 1217 loss=2.32, nll_loss=0.093, accuracy=22.5, wps=150.2, ups=0.75, wpb=200.4, bsz=8, num_updates=450, lr=9.48718e-06, gnorm=3.626, loss_scale=128, train_wall=33, gb_free=15.7, wall=632
2023-10-27 21:28:10 | INFO | train_inner | epoch 001:    475 / 1217 loss=2.313, nll_loss=0.096, accuracy=23.5, wps=141.5, ups=0.74, wpb=191.8, bsz=8, num_updates=475, lr=9.44444e-06, gnorm=3.635, loss_scale=128, train_wall=33, gb_free=15.7, wall=666
2023-10-27 21:28:45 | INFO | train_inner | epoch 001:    500 / 1217 loss=2.327, nll_loss=0.095, accuracy=23, wps=141.9, ups=0.73, wpb=195, bsz=8, num_updates=500, lr=9.40171e-06, gnorm=3.646, loss_scale=128, train_wall=34, gb_free=15.5, wall=700
2023-10-27 21:29:18 | INFO | train_inner | epoch 001:    525 / 1217 loss=2.326, nll_loss=0.097, accuracy=23, wps=145.2, ups=0.75, wpb=192.6, bsz=8, num_updates=525, lr=9.35897e-06, gnorm=3.726, loss_scale=128, train_wall=33, gb_free=15.7, wall=733
2023-10-27 21:29:53 | INFO | train_inner | epoch 001:    550 / 1217 loss=2.334, nll_loss=0.098, accuracy=17.5, wps=136.9, ups=0.72, wpb=191.1, bsz=8, num_updates=550, lr=9.31624e-06, gnorm=3.679, loss_scale=128, train_wall=34, gb_free=15.7, wall=768
2023-10-27 21:30:27 | INFO | train_inner | epoch 001:    575 / 1217 loss=2.331, nll_loss=0.093, accuracy=21, wps=147, ups=0.73, wpb=201, bsz=8, num_updates=575, lr=9.2735e-06, gnorm=3.621, loss_scale=128, train_wall=34, gb_free=15.7, wall=802
2023-10-27 21:31:00 | INFO | train_inner | epoch 001:    600 / 1217 loss=2.325, nll_loss=0.099, accuracy=21.5, wps=143.3, ups=0.76, wpb=187.6, bsz=8, num_updates=600, lr=9.23077e-06, gnorm=3.932, loss_scale=128, train_wall=32, gb_free=15.7, wall=835
2023-10-27 21:31:32 | INFO | train_inner | epoch 001:    625 / 1217 loss=2.331, nll_loss=0.098, accuracy=18, wps=150.2, ups=0.79, wpb=190.7, bsz=8, num_updates=625, lr=9.18803e-06, gnorm=3.786, loss_scale=128, train_wall=31, gb_free=15.7, wall=867
2023-10-27 21:32:05 | INFO | train_inner | epoch 001:    650 / 1217 loss=2.335, nll_loss=0.097, accuracy=18.5, wps=142.1, ups=0.74, wpb=191.6, bsz=8, num_updates=650, lr=9.1453e-06, gnorm=3.613, loss_scale=128, train_wall=33, gb_free=15.7, wall=900
2023-10-27 21:32:39 | INFO | train_inner | epoch 001:    675 / 1217 loss=2.324, nll_loss=0.094, accuracy=17.5, wps=146.5, ups=0.74, wpb=197.6, bsz=8, num_updates=675, lr=9.10256e-06, gnorm=3.37, loss_scale=128, train_wall=33, gb_free=15.4, wall=934
2023-10-27 21:33:10 | INFO | train_inner | epoch 001:    700 / 1217 loss=2.333, nll_loss=0.1, accuracy=19.5, wps=149.8, ups=0.8, wpb=186.9, bsz=8, num_updates=700, lr=9.05983e-06, gnorm=3.202, loss_scale=128, train_wall=31, gb_free=15.7, wall=965
2023-10-27 21:33:45 | INFO | train_inner | epoch 001:    725 / 1217 loss=2.33, nll_loss=0.099, accuracy=19, wps=136.6, ups=0.72, wpb=189, bsz=8, num_updates=725, lr=9.01709e-06, gnorm=3.108, loss_scale=128, train_wall=34, gb_free=15.7, wall=1000
2023-10-27 21:34:18 | INFO | train_inner | epoch 001:    750 / 1217 loss=2.33, nll_loss=0.097, accuracy=17.5, wps=144.7, ups=0.75, wpb=192.8, bsz=8, num_updates=750, lr=8.97436e-06, gnorm=2.985, loss_scale=128, train_wall=33, gb_free=15.7, wall=1033
2023-10-27 21:34:51 | INFO | train_inner | epoch 001:    775 / 1217 loss=2.32, nll_loss=0.099, accuracy=19, wps=144.8, ups=0.77, wpb=188.2, bsz=8, num_updates=775, lr=8.93162e-06, gnorm=2.999, loss_scale=128, train_wall=32, gb_free=15.7, wall=1066
2023-10-27 21:35:23 | INFO | train_inner | epoch 001:    800 / 1217 loss=2.327, nll_loss=0.096, accuracy=19, wps=149.8, ups=0.77, wpb=193.6, bsz=8, num_updates=800, lr=8.88889e-06, gnorm=3.009, loss_scale=128, train_wall=32, gb_free=15.7, wall=1098
2023-10-27 21:35:56 | INFO | train_inner | epoch 001:    825 / 1217 loss=2.334, nll_loss=0.097, accuracy=15.5, wps=143.5, ups=0.75, wpb=191.9, bsz=8, num_updates=825, lr=8.84615e-06, gnorm=2.958, loss_scale=128, train_wall=33, gb_free=15.7, wall=1131
2023-10-27 21:36:30 | INFO | train_inner | epoch 001:    850 / 1217 loss=2.316, nll_loss=0.096, accuracy=23, wps=142.4, ups=0.74, wpb=192.7, bsz=8, num_updates=850, lr=8.80342e-06, gnorm=2.937, loss_scale=128, train_wall=33, gb_free=15.7, wall=1165
2023-10-27 21:37:05 | INFO | train_inner | epoch 001:    875 / 1217 loss=2.322, nll_loss=0.094, accuracy=18, wps=144, ups=0.73, wpb=198.2, bsz=8, num_updates=875, lr=8.76068e-06, gnorm=2.957, loss_scale=128, train_wall=34, gb_free=15.7, wall=1200
2023-10-27 21:37:38 | INFO | train_inner | epoch 001:    900 / 1217 loss=2.34, nll_loss=0.096, accuracy=16.5, wps=147.5, ups=0.76, wpb=194.8, bsz=8, num_updates=900, lr=8.71795e-06, gnorm=2.984, loss_scale=128, train_wall=33, gb_free=15.7, wall=1233
2023-10-27 21:38:12 | INFO | train_inner | epoch 001:    925 / 1217 loss=2.338, nll_loss=0.097, accuracy=16.5, wps=137.9, ups=0.72, wpb=192.1, bsz=8, num_updates=925, lr=8.67521e-06, gnorm=2.919, loss_scale=128, train_wall=34, gb_free=15.7, wall=1268
2023-10-27 21:38:48 | INFO | train_inner | epoch 001:    950 / 1217 loss=2.319, nll_loss=0.096, accuracy=20.5, wps=134.5, ups=0.7, wpb=192.8, bsz=8, num_updates=950, lr=8.63248e-06, gnorm=2.824, loss_scale=128, train_wall=35, gb_free=15.7, wall=1303
2023-10-27 21:39:21 | INFO | train_inner | epoch 001:    975 / 1217 loss=2.313, nll_loss=0.092, accuracy=27.5, wps=151.9, ups=0.76, wpb=200.2, bsz=8, num_updates=975, lr=8.58974e-06, gnorm=2.884, loss_scale=128, train_wall=33, gb_free=15.7, wall=1336
2023-10-27 21:39:55 | INFO | train_inner | epoch 001:   1000 / 1217 loss=2.348, nll_loss=0.094, accuracy=12.5, wps=147.3, ups=0.74, wpb=198.9, bsz=8, num_updates=1000, lr=8.54701e-06, gnorm=2.858, loss_scale=128, train_wall=33, gb_free=15.7, wall=1370
2023-10-27 21:40:29 | INFO | train_inner | epoch 001:   1025 / 1217 loss=2.315, nll_loss=0.097, accuracy=20.5, wps=138.2, ups=0.73, wpb=190.2, bsz=8, num_updates=1025, lr=8.50427e-06, gnorm=2.81, loss_scale=128, train_wall=34, gb_free=15.7, wall=1405
2023-10-27 21:41:05 | INFO | train_inner | epoch 001:   1050 / 1217 loss=2.32, nll_loss=0.095, accuracy=18.5, wps=137.4, ups=0.7, wpb=196.3, bsz=8, num_updates=1050, lr=8.46154e-06, gnorm=2.863, loss_scale=128, train_wall=35, gb_free=15.7, wall=1440
2023-10-27 21:41:41 | INFO | train_inner | epoch 001:   1075 / 1217 loss=2.321, nll_loss=0.098, accuracy=20.5, wps=132.2, ups=0.7, wpb=189.5, bsz=8, num_updates=1075, lr=8.4188e-06, gnorm=2.86, loss_scale=128, train_wall=35, gb_free=15.7, wall=1476
2023-10-27 21:42:17 | INFO | train_inner | epoch 001:   1100 / 1217 loss=2.335, nll_loss=0.095, accuracy=19.5, wps=136.5, ups=0.69, wpb=197.6, bsz=8, num_updates=1100, lr=8.37607e-06, gnorm=2.823, loss_scale=128, train_wall=36, gb_free=15.7, wall=1512
2023-10-27 21:42:51 | INFO | train_inner | epoch 001:   1125 / 1217 loss=2.319, nll_loss=0.096, accuracy=21.5, wps=141.4, ups=0.73, wpb=192.6, bsz=8, num_updates=1125, lr=8.33333e-06, gnorm=2.722, loss_scale=128, train_wall=34, gb_free=15.7, wall=1546
2023-10-27 21:43:25 | INFO | train_inner | epoch 001:   1150 / 1217 loss=2.327, nll_loss=0.098, accuracy=15.5, wps=138.5, ups=0.73, wpb=190, bsz=8, num_updates=1150, lr=8.2906e-06, gnorm=2.709, loss_scale=128, train_wall=34, gb_free=15.7, wall=1581
2023-10-27 21:44:03 | INFO | train_inner | epoch 001:   1175 / 1217 loss=2.339, nll_loss=0.092, accuracy=17.5, wps=135.1, ups=0.66, wpb=204.3, bsz=8, num_updates=1175, lr=8.24786e-06, gnorm=2.722, loss_scale=128, train_wall=37, gb_free=15.7, wall=1618
2023-10-27 21:44:38 | INFO | train_inner | epoch 001:   1200 / 1217 loss=2.325, nll_loss=0.096, accuracy=17, wps=139.2, ups=0.72, wpb=193.4, bsz=8, num_updates=1200, lr=8.20513e-06, gnorm=2.656, loss_scale=128, train_wall=34, gb_free=15.7, wall=1653
2023-10-27 21:45:04 | INFO | fairseq_cli.train | begin validation on "valid" subset
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:45:04 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 21:46:07 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 2.322 | nll_loss 0.097 | accuracy 20.4 | wps 464.7 | wpb 191.6 | bsz 8 | num_updates 1217
2023-10-27 21:46:07 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 1 @ 1217 updates
2023-10-27 21:46:07 | INFO | fairseq.trainer | Saving checkpoint to /mnt/data/lizuchao/gongrh/fairseq/checkpoints/checkpoint_best.pt
2023-10-27 21:46:25 | INFO | fairseq.trainer | Finished saving checkpoint to /mnt/data/lizuchao/gongrh/fairseq/checkpoints/checkpoint_best.pt
2023-10-27 21:46:25 | INFO | fairseq.checkpoint_utils | Saved checkpoint checkpoints/checkpoint_best.pt (epoch 1 @ 1217 updates, score 20.4) (writing took 17.92632083798526 seconds)
2023-10-27 21:46:25 | INFO | fairseq_cli.train | end of epoch 1 (average epoch stats below)
2023-10-27 21:46:25 | INFO | train | epoch 001 | loss 2.328 | nll_loss 0.096 | accuracy 19.5 | wps 136.3 | ups 0.71 | wpb 193 | bsz 8 | num_updates 1217 | lr 8.17607e-06 | gnorm 4 | loss_scale 128 | train_wall 1633 | gb_free 15.7 | wall 1760
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 21:46:25 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 2
2023-10-27 21:46:25 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 21:46:25 | INFO | fairseq.trainer | begin training epoch 2
2023-10-27 21:46:25 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 21:46:37 | INFO | train_inner | epoch 002:      8 / 1217 loss=2.325, nll_loss=0.095, accuracy=20.5, wps=41, ups=0.21, wpb=195.4, bsz=8, num_updates=1225, lr=8.16239e-06, gnorm=2.667, loss_scale=128, train_wall=37, gb_free=15.7, wall=1772
2023-10-27 21:47:11 | INFO | train_inner | epoch 002:     33 / 1217 loss=2.34, nll_loss=0.097, accuracy=14, wps=144.1, ups=0.75, wpb=193.4, bsz=8, num_updates=1250, lr=8.11966e-06, gnorm=2.662, loss_scale=128, train_wall=33, gb_free=15.7, wall=1806
2023-10-27 21:47:44 | INFO | train_inner | epoch 002:     58 / 1217 loss=2.328, nll_loss=0.098, accuracy=22.5, wps=142.5, ups=0.75, wpb=189.4, bsz=8, num_updates=1275, lr=8.07692e-06, gnorm=2.588, loss_scale=128, train_wall=33, gb_free=15.7, wall=1839
2023-10-27 21:48:18 | INFO | train_inner | epoch 002:     83 / 1217 loss=2.336, nll_loss=0.1, accuracy=20.5, wps=137, ups=0.73, wpb=187.6, bsz=8, num_updates=1300, lr=8.03419e-06, gnorm=2.657, loss_scale=128, train_wall=34, gb_free=15.5, wall=1873
2023-10-27 21:48:52 | INFO | train_inner | epoch 002:    108 / 1217 loss=2.313, nll_loss=0.096, accuracy=26, wps=143, ups=0.74, wpb=193.7, bsz=8, num_updates=1325, lr=7.99145e-06, gnorm=2.66, loss_scale=128, train_wall=33, gb_free=15.7, wall=1907
2023-10-27 21:49:30 | INFO | train_inner | epoch 002:    133 / 1217 loss=2.343, nll_loss=0.097, accuracy=15.5, wps=127.8, ups=0.66, wpb=194, bsz=8, num_updates=1350, lr=7.94872e-06, gnorm=2.635, loss_scale=128, train_wall=37, gb_free=15.6, wall=1945
2023-10-27 21:50:05 | INFO | train_inner | epoch 002:    158 / 1217 loss=2.332, nll_loss=0.097, accuracy=22.5, wps=139.2, ups=0.72, wpb=193, bsz=8, num_updates=1375, lr=7.90598e-06, gnorm=2.576, loss_scale=128, train_wall=34, gb_free=15.4, wall=1980
2023-10-27 21:50:40 | INFO | train_inner | epoch 002:    183 / 1217 loss=2.327, nll_loss=0.095, accuracy=22.5, wps=137, ups=0.7, wpb=196.3, bsz=8, num_updates=1400, lr=7.86325e-06, gnorm=2.538, loss_scale=128, train_wall=35, gb_free=15.2, wall=2016
2023-10-27 21:51:19 | INFO | train_inner | epoch 002:    208 / 1217 loss=2.309, nll_loss=0.097, accuracy=21.5, wps=123.7, ups=0.65, wpb=189.8, bsz=8, num_updates=1425, lr=7.82051e-06, gnorm=2.542, loss_scale=128, train_wall=38, gb_free=15.7, wall=2054
2023-10-27 21:51:53 | INFO | train_inner | epoch 002:    233 / 1217 loss=2.324, nll_loss=0.1, accuracy=20.5, wps=137, ups=0.74, wpb=186.2, bsz=8, num_updates=1450, lr=7.77778e-06, gnorm=2.572, loss_scale=128, train_wall=34, gb_free=15.7, wall=2088
2023-10-27 21:52:24 | INFO | train_inner | epoch 002:    258 / 1217 loss=2.327, nll_loss=0.096, accuracy=20, wps=154.3, ups=0.79, wpb=194.9, bsz=8, num_updates=1475, lr=7.73504e-06, gnorm=2.59, loss_scale=128, train_wall=31, gb_free=15.7, wall=2120
2023-10-27 21:52:58 | INFO | train_inner | epoch 002:    283 / 1217 loss=2.331, nll_loss=0.098, accuracy=15.5, wps=143.3, ups=0.75, wpb=190.8, bsz=8, num_updates=1500, lr=7.69231e-06, gnorm=2.564, loss_scale=128, train_wall=33, gb_free=15.7, wall=2153
2023-10-27 21:53:32 | INFO | train_inner | epoch 002:    308 / 1217 loss=2.328, nll_loss=0.093, accuracy=22, wps=145.3, ups=0.73, wpb=199.7, bsz=8, num_updates=1525, lr=7.64957e-06, gnorm=2.568, loss_scale=128, train_wall=34, gb_free=15.7, wall=2187
2023-10-27 21:54:05 | INFO | train_inner | epoch 002:    333 / 1217 loss=2.321, nll_loss=0.098, accuracy=19, wps=144, ups=0.76, wpb=188.8, bsz=8, num_updates=1550, lr=7.60684e-06, gnorm=2.606, loss_scale=128, train_wall=32, gb_free=15.7, wall=2220
2023-10-27 21:54:38 | INFO | train_inner | epoch 002:    358 / 1217 loss=2.328, nll_loss=0.097, accuracy=25.5, wps=142.5, ups=0.75, wpb=191.1, bsz=8, num_updates=1575, lr=7.5641e-06, gnorm=2.685, loss_scale=128, train_wall=33, gb_free=15.2, wall=2254
2023-10-27 21:55:10 | INFO | train_inner | epoch 002:    383 / 1217 loss=2.336, nll_loss=0.099, accuracy=20.5, wps=151, ups=0.8, wpb=188.7, bsz=8, num_updates=1600, lr=7.52137e-06, gnorm=2.755, loss_scale=128, train_wall=31, gb_free=15.7, wall=2285
2023-10-27 21:55:42 | INFO | train_inner | epoch 002:    408 / 1217 loss=2.325, nll_loss=0.098, accuracy=20.5, wps=145.9, ups=0.77, wpb=190.6, bsz=8, num_updates=1625, lr=7.47863e-06, gnorm=2.694, loss_scale=128, train_wall=32, gb_free=15.7, wall=2317
2023-10-27 21:56:15 | INFO | train_inner | epoch 002:    433 / 1217 loss=2.322, nll_loss=0.095, accuracy=19, wps=148.9, ups=0.76, wpb=194.9, bsz=8, num_updates=1650, lr=7.4359e-06, gnorm=2.713, loss_scale=128, train_wall=32, gb_free=15.4, wall=2350
2023-10-27 21:56:48 | INFO | train_inner | epoch 002:    458 / 1217 loss=2.328, nll_loss=0.094, accuracy=18, wps=150.7, ups=0.76, wpb=197.3, bsz=8, num_updates=1675, lr=7.39316e-06, gnorm=2.65, loss_scale=128, train_wall=32, gb_free=15.3, wall=2383
2023-10-27 21:57:21 | INFO | train_inner | epoch 002:    483 / 1217 loss=2.319, nll_loss=0.096, accuracy=16, wps=143.4, ups=0.74, wpb=193.3, bsz=8, num_updates=1700, lr=7.35043e-06, gnorm=2.613, loss_scale=128, train_wall=33, gb_free=15.7, wall=2417
2023-10-27 21:57:55 | INFO | train_inner | epoch 002:    508 / 1217 loss=2.349, nll_loss=0.093, accuracy=15.5, wps=152.3, ups=0.75, wpb=202.3, bsz=8, num_updates=1725, lr=7.30769e-06, gnorm=2.75, loss_scale=128, train_wall=33, gb_free=15.6, wall=2450
2023-10-27 21:58:27 | INFO | train_inner | epoch 002:    533 / 1217 loss=2.317, nll_loss=0.099, accuracy=23, wps=144.8, ups=0.77, wpb=188.1, bsz=8, num_updates=1750, lr=7.26496e-06, gnorm=3.164, loss_scale=128, train_wall=32, gb_free=15.7, wall=2482
2023-10-27 21:59:00 | INFO | train_inner | epoch 002:    558 / 1217 loss=2.336, nll_loss=0.098, accuracy=21.5, wps=143.6, ups=0.75, wpb=190.5, bsz=8, num_updates=1775, lr=7.22222e-06, gnorm=2.83, loss_scale=128, train_wall=33, gb_free=15.5, wall=2515
2023-10-27 21:59:35 | INFO | train_inner | epoch 002:    583 / 1217 loss=2.335, nll_loss=0.101, accuracy=18.5, wps=133.2, ups=0.72, wpb=185.5, bsz=8, num_updates=1800, lr=7.17949e-06, gnorm=2.794, loss_scale=128, train_wall=34, gb_free=15.7, wall=2550
2023-10-27 22:00:09 | INFO | train_inner | epoch 002:    608 / 1217 loss=2.333, nll_loss=0.097, accuracy=14.5, wps=140.5, ups=0.73, wpb=192.4, bsz=8, num_updates=1825, lr=7.13675e-06, gnorm=2.895, loss_scale=128, train_wall=34, gb_free=15.7, wall=2585
2023-10-27 22:00:43 | INFO | train_inner | epoch 002:    633 / 1217 loss=2.33, nll_loss=0.096, accuracy=17.5, wps=143.5, ups=0.74, wpb=194.2, bsz=8, num_updates=1850, lr=7.09402e-06, gnorm=3.265, loss_scale=128, train_wall=33, gb_free=15.7, wall=2618
2023-10-27 22:01:17 | INFO | train_inner | epoch 002:    658 / 1217 loss=2.331, nll_loss=0.096, accuracy=18.5, wps=145.7, ups=0.75, wpb=195.2, bsz=8, num_updates=1875, lr=7.05128e-06, gnorm=2.786, loss_scale=128, train_wall=33, gb_free=15.7, wall=2652
2023-10-27 22:01:51 | INFO | train_inner | epoch 002:    683 / 1217 loss=2.325, nll_loss=0.097, accuracy=21, wps=138.3, ups=0.72, wpb=192, bsz=8, num_updates=1900, lr=7.00855e-06, gnorm=2.908, loss_scale=128, train_wall=34, gb_free=15.7, wall=2687
2023-10-27 22:02:24 | INFO | train_inner | epoch 002:    708 / 1217 loss=2.322, nll_loss=0.096, accuracy=20.5, wps=148.3, ups=0.77, wpb=193.2, bsz=8, num_updates=1925, lr=6.96581e-06, gnorm=3.026, loss_scale=128, train_wall=32, gb_free=15.7, wall=2719
2023-10-27 22:02:57 | INFO | train_inner | epoch 002:    733 / 1217 loss=2.326, nll_loss=0.098, accuracy=21, wps=143.2, ups=0.75, wpb=190, bsz=8, num_updates=1950, lr=6.92308e-06, gnorm=3.382, loss_scale=128, train_wall=33, gb_free=15.7, wall=2752
2023-10-27 22:03:31 | INFO | train_inner | epoch 002:    758 / 1217 loss=2.335, nll_loss=0.096, accuracy=18.5, wps=143.1, ups=0.74, wpb=193.6, bsz=8, num_updates=1975, lr=6.88034e-06, gnorm=3.273, loss_scale=128, train_wall=33, gb_free=15.7, wall=2786
2023-10-27 22:04:11 | INFO | train_inner | epoch 002:    783 / 1217 loss=2.316, nll_loss=0.098, accuracy=21, wps=116.6, ups=0.62, wpb=188.5, bsz=8, num_updates=2000, lr=6.83761e-06, gnorm=3.013, loss_scale=128, train_wall=40, gb_free=15.7, wall=2827
2023-10-27 22:04:45 | INFO | train_inner | epoch 002:    808 / 1217 loss=2.314, nll_loss=0.096, accuracy=23, wps=143.4, ups=0.75, wpb=192.3, bsz=8, num_updates=2025, lr=6.79487e-06, gnorm=2.988, loss_scale=128, train_wall=33, gb_free=15.7, wall=2860
2023-10-27 22:05:18 | INFO | train_inner | epoch 002:    833 / 1217 loss=2.347, nll_loss=0.097, accuracy=14, wps=148.7, ups=0.77, wpb=193.7, bsz=8, num_updates=2050, lr=6.75214e-06, gnorm=3.376, loss_scale=128, train_wall=32, gb_free=15.7, wall=2893
2023-10-27 22:05:54 | INFO | train_inner | epoch 002:    858 / 1217 loss=2.328, nll_loss=0.096, accuracy=21.5, wps=131.6, ups=0.68, wpb=193.1, bsz=8, num_updates=2075, lr=6.7094e-06, gnorm=3.317, loss_scale=128, train_wall=36, gb_free=15.7, wall=2929
2023-10-27 22:06:26 | INFO | train_inner | epoch 002:    883 / 1217 loss=2.33, nll_loss=0.095, accuracy=21, wps=153.2, ups=0.78, wpb=196.8, bsz=8, num_updates=2100, lr=6.66667e-06, gnorm=3.072, loss_scale=128, train_wall=32, gb_free=15.6, wall=2961
2023-10-27 22:07:01 | INFO | train_inner | epoch 002:    908 / 1217 loss=2.326, nll_loss=0.093, accuracy=20.5, wps=142.9, ups=0.72, wpb=199.6, bsz=8, num_updates=2125, lr=6.62393e-06, gnorm=2.949, loss_scale=128, train_wall=35, gb_free=15.6, wall=2996
2023-10-27 22:07:35 | INFO | train_inner | epoch 002:    933 / 1217 loss=2.322, nll_loss=0.094, accuracy=21, wps=147.6, ups=0.75, wpb=197.7, bsz=8, num_updates=2150, lr=6.5812e-06, gnorm=3.133, loss_scale=128, train_wall=33, gb_free=15.7, wall=3030
2023-10-27 22:08:09 | INFO | train_inner | epoch 002:    958 / 1217 loss=2.328, nll_loss=0.093, accuracy=20, wps=144.4, ups=0.72, wpb=199.9, bsz=8, num_updates=2175, lr=6.53846e-06, gnorm=3.201, loss_scale=128, train_wall=34, gb_free=15.7, wall=3065
2023-10-27 22:08:43 | INFO | train_inner | epoch 002:    983 / 1217 loss=2.321, nll_loss=0.097, accuracy=23, wps=140.5, ups=0.74, wpb=190.9, bsz=8, num_updates=2200, lr=6.49573e-06, gnorm=3.08, loss_scale=128, train_wall=34, gb_free=15.7, wall=3098
2023-10-27 22:09:16 | INFO | train_inner | epoch 002:   1008 / 1217 loss=2.331, nll_loss=0.097, accuracy=17.5, wps=147, ups=0.77, wpb=191.4, bsz=8, num_updates=2225, lr=6.45299e-06, gnorm=2.92, loss_scale=128, train_wall=32, gb_free=15.3, wall=3131
2023-10-27 22:09:49 | INFO | train_inner | epoch 002:   1033 / 1217 loss=2.324, nll_loss=0.096, accuracy=19.5, wps=148.8, ups=0.77, wpb=194.5, bsz=8, num_updates=2250, lr=6.41026e-06, gnorm=2.963, loss_scale=128, train_wall=32, gb_free=15.7, wall=3164
2023-10-27 22:10:21 | INFO | train_inner | epoch 002:   1058 / 1217 loss=2.316, nll_loss=0.096, accuracy=25, wps=147.8, ups=0.77, wpb=193.1, bsz=8, num_updates=2275, lr=6.36752e-06, gnorm=2.86, loss_scale=128, train_wall=32, gb_free=15.7, wall=3196
2023-10-27 22:10:54 | INFO | train_inner | epoch 002:   1083 / 1217 loss=2.312, nll_loss=0.098, accuracy=27, wps=142.5, ups=0.75, wpb=189.4, bsz=8, num_updates=2300, lr=6.32479e-06, gnorm=2.9, loss_scale=128, train_wall=33, gb_free=15.7, wall=3230
2023-10-27 22:11:26 | INFO | train_inner | epoch 002:   1108 / 1217 loss=2.327, nll_loss=0.097, accuracy=22, wps=150.8, ups=0.79, wpb=191.5, bsz=8, num_updates=2325, lr=6.28205e-06, gnorm=2.962, loss_scale=128, train_wall=31, gb_free=15.7, wall=3261
2023-10-27 22:12:03 | INFO | train_inner | epoch 002:   1133 / 1217 loss=2.327, nll_loss=0.099, accuracy=20, wps=128.5, ups=0.68, wpb=188.2, bsz=8, num_updates=2350, lr=6.23932e-06, gnorm=2.935, loss_scale=128, train_wall=36, gb_free=15.7, wall=3298
2023-10-27 22:12:39 | INFO | train_inner | epoch 002:   1158 / 1217 loss=2.313, nll_loss=0.092, accuracy=21.5, wps=138, ups=0.69, wpb=201.3, bsz=8, num_updates=2375, lr=6.19658e-06, gnorm=3.045, loss_scale=128, train_wall=36, gb_free=15.7, wall=3334
2023-10-27 22:13:15 | INFO | train_inner | epoch 002:   1183 / 1217 loss=2.338, nll_loss=0.095, accuracy=15, wps=137.5, ups=0.7, wpb=197, bsz=8, num_updates=2400, lr=6.15385e-06, gnorm=3.044, loss_scale=128, train_wall=35, gb_free=15.7, wall=3370
2023-10-27 22:13:46 | INFO | train_inner | epoch 002:   1208 / 1217 loss=2.333, nll_loss=0.096, accuracy=18, wps=157.3, ups=0.81, wpb=195.1, bsz=8, num_updates=2425, lr=6.11111e-06, gnorm=3.039, loss_scale=128, train_wall=31, gb_free=15.7, wall=3401
2023-10-27 22:13:59 | INFO | fairseq_cli.train | begin validation on "valid" subset
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 22:13:59 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 1
2023-10-27 22:15:17 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 2.322 | nll_loss 0.097 | accuracy 18.5 | wps 377.3 | wpb 191.6 | bsz 8 | num_updates 2434 | best_accuracy 20.4
2023-10-27 22:15:17 | INFO | fairseq.checkpoint_utils | Preparing to save checkpoint for epoch 2 @ 2434 updates
2023-10-27 22:15:17 | INFO | fairseq_cli.train | end of epoch 2 (average epoch stats below)
2023-10-27 22:15:17 | INFO | train | epoch 002 | loss 2.327 | nll_loss 0.096 | accuracy 20.1 | wps 135.6 | ups 0.7 | wpb 193 | bsz 8 | num_updates 2434 | lr 6.09573e-06 | gnorm 2.87 | loss_scale 128 | train_wall 1633 | gb_free 15.7 | wall 3492
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | can_reuse_epoch_itr = False
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | reuse_dataloader = True
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | rebuild_batches = False
2023-10-27 22:15:17 | INFO | fairseq.tasks.fairseq_task | creating new batches for epoch 3
2023-10-27 22:15:17 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1217
2023-10-27 22:15:17 | INFO | fairseq.trainer | begin training epoch 3
2023-10-27 22:15:17 | INFO | fairseq_cli.train | Start iterating over samples
2023-10-27 22:15:43 | INFO | train_inner | epoch 003:     16 / 1217 loss=2.32, nll_loss=0.096, accuracy=19.5, wps=41.3, ups=0.21, wpb=193, bsz=8, num_updates=2450, lr=6.06838e-06, gnorm=3.035, loss_scale=128, train_wall=38, gb_free=15.7, wall=3518
2023-10-27 22:16:17 | INFO | train_inner | epoch 003:     41 / 1217 loss=2.307, nll_loss=0.094, accuracy=19, wps=143, ups=0.73, wpb=196.2, bsz=8, num_updates=2475, lr=6.02564e-06, gnorm=2.932, loss_scale=128, train_wall=34, gb_free=15.7, wall=3552
2023-10-27 22:16:44 | INFO | train_inner | epoch 003:     66 / 1217 loss=2.333, nll_loss=0.098, accuracy=20.5, wps=175.4, ups=0.92, wpb=190.2, bsz=8, num_updates=2500, lr=5.98291e-06, gnorm=2.979, loss_scale=128, train_wall=27, gb_free=15.7, wall=3580
2023-10-27 22:17:01 | INFO | train_inner | epoch 003:     91 / 1217 loss=2.325, nll_loss=0.098, accuracy=18.5, wps=289.8, ups=1.53, wpb=189.8, bsz=8, num_updates=2525, lr=5.94017e-06, gnorm=2.91, loss_scale=128, train_wall=16, gb_free=15.7, wall=3596
2023-10-27 22:17:38 | INFO | train_inner | epoch 003:    116 / 1217 loss=2.32, nll_loss=0.095, accuracy=19.5, wps=131.8, ups=0.67, wpb=195.9, bsz=8, num_updates=2550, lr=5.89744e-06, gnorm=2.873, loss_scale=128, train_wall=37, gb_free=15.4, wall=3633
2023-10-27 22:18:13 | INFO | train_inner | epoch 003:    141 / 1217 loss=2.336, nll_loss=0.098, accuracy=17, wps=137.5, ups=0.72, wpb=191.3, bsz=8, num_updates=2575, lr=5.8547e-06, gnorm=2.88, loss_scale=128, train_wall=34, gb_free=15.6, wall=3668
2023-10-27 22:18:47 | INFO | train_inner | epoch 003:    166 / 1217 loss=2.322, nll_loss=0.095, accuracy=22.5, wps=140.9, ups=0.72, wpb=194.8, bsz=8, num_updates=2600, lr=5.81197e-06, gnorm=2.834, loss_scale=128, train_wall=34, gb_free=15.7, wall=3703
2023-10-27 22:19:09 | INFO | train_inner | epoch 003:    191 / 1217 loss=2.318, nll_loss=0.098, accuracy=20.5, wps=216.1, ups=1.15, wpb=188.6, bsz=8, num_updates=2625, lr=5.76923e-06, gnorm=2.909, loss_scale=128, train_wall=22, gb_free=15.4, wall=3724
2023-10-27 22:19:43 | INFO | train_inner | epoch 003:    216 / 1217 loss=2.329, nll_loss=0.094, accuracy=20, wps=144.6, ups=0.73, wpb=197.3, bsz=8, num_updates=2650, lr=5.7265e-06, gnorm=2.928, loss_scale=128, train_wall=34, gb_free=15.7, wall=3758
2023-10-27 22:20:19 | INFO | train_inner | epoch 003:    241 / 1217 loss=2.334, nll_loss=0.096, accuracy=20, wps=136, ups=0.7, wpb=193.8, bsz=8, num_updates=2675, lr=5.68376e-06, gnorm=2.895, loss_scale=128, train_wall=35, gb_free=15.7, wall=3794
2023-10-27 22:20:53 | INFO | train_inner | epoch 003:    266 / 1217 loss=2.317, nll_loss=0.094, accuracy=23, wps=145.2, ups=0.73, wpb=198.2, bsz=8, num_updates=2700, lr=5.64103e-06, gnorm=2.855, loss_scale=128, train_wall=34, gb_free=15.7, wall=3828
2023-10-27 22:21:27 | INFO | train_inner | epoch 003:    291 / 1217 loss=2.32, nll_loss=0.096, accuracy=25, wps=142.2, ups=0.74, wpb=192.6, bsz=8, num_updates=2725, lr=5.59829e-06, gnorm=2.838, loss_scale=128, train_wall=33, gb_free=15.7, wall=3862
2023-10-27 22:22:01 | INFO | train_inner | epoch 003:    316 / 1217 loss=2.326, nll_loss=0.095, accuracy=20.5, wps=143.7, ups=0.74, wpb=195.5, bsz=8, num_updates=2750, lr=5.55556e-06, gnorm=2.839, loss_scale=128, train_wall=34, gb_free=15.7, wall=3896
2023-10-27 22:22:34 | INFO | train_inner | epoch 003:    341 / 1217 loss=2.339, nll_loss=0.095, accuracy=12.5, wps=149.8, ups=0.76, wpb=197.8, bsz=8, num_updates=2775, lr=5.51282e-06, gnorm=2.837, loss_scale=128, train_wall=33, gb_free=15.7, wall=3929
2023-10-27 22:23:06 | INFO | train_inner | epoch 003:    366 / 1217 loss=2.33, nll_loss=0.095, accuracy=18.5, wps=151.4, ups=0.77, wpb=195.6, bsz=8, num_updates=2800, lr=5.47009e-06, gnorm=2.781, loss_scale=128, train_wall=32, gb_free=15.4, wall=3961
2023-10-27 22:23:40 | INFO | train_inner | epoch 003:    391 / 1217 loss=2.328, nll_loss=0.097, accuracy=22.5, wps=141.2, ups=0.73, wpb=193, bsz=8, num_updates=2825, lr=5.42735e-06, gnorm=2.726, loss_scale=128, train_wall=34, gb_free=15.7, wall=3996
2023-10-27 22:24:17 | INFO | train_inner | epoch 003:    416 / 1217 loss=2.315, nll_loss=0.098, accuracy=22, wps=130.1, ups=0.69, wpb=188.1, bsz=8, num_updates=2850, lr=5.38462e-06, gnorm=2.714, loss_scale=128, train_wall=36, gb_free=15.7, wall=4032
2023-10-27 22:24:52 | INFO | train_inner | epoch 003:    441 / 1217 loss=2.32, nll_loss=0.094, accuracy=22, wps=140.6, ups=0.71, wpb=197.4, bsz=8, num_updates=2875, lr=5.34188e-06, gnorm=2.741, loss_scale=128, train_wall=35, gb_free=15.7, wall=4067
2023-10-27 22:25:27 | INFO | train_inner | epoch 003:    466 / 1217 loss=2.32, nll_loss=0.097, accuracy=25, wps=137.2, ups=0.71, wpb=192, bsz=8, num_updates=2900, lr=5.29915e-06, gnorm=2.737, loss_scale=128, train_wall=35, gb_free=15.7, wall=4102
2023-10-27 22:26:00 | INFO | train_inner | epoch 003:    491 / 1217 loss=2.332, nll_loss=0.097, accuracy=19, wps=144.3, ups=0.75, wpb=193.2, bsz=8, num_updates=2925, lr=5.25641e-06, gnorm=2.751, loss_scale=128, train_wall=33, gb_free=15.7, wall=4135
2023-10-27 22:26:33 | INFO | train_inner | epoch 003:    516 / 1217 loss=2.329, nll_loss=0.097, accuracy=20.5, wps=145.2, ups=0.75, wpb=192.8, bsz=8, num_updates=2950, lr=5.21368e-06, gnorm=2.703, loss_scale=128, train_wall=33, gb_free=15.7, wall=4168
2023-10-27 22:27:08 | INFO | train_inner | epoch 003:    541 / 1217 loss=2.315, nll_loss=0.099, accuracy=21.5, wps=133, ups=0.71, wpb=186.8, bsz=8, num_updates=2975, lr=5.17094e-06, gnorm=2.657, loss_scale=128, train_wall=35, gb_free=15.7, wall=4204
2023-10-27 22:27:43 | INFO | train_inner | epoch 003:    566 / 1217 loss=2.336, nll_loss=0.098, accuracy=15.5, wps=140.3, ups=0.73, wpb=191.4, bsz=8, num_updates=3000, lr=5.12821e-06, gnorm=2.685, loss_scale=128, train_wall=34, gb_free=15.1, wall=4238
2023-10-27 22:28:15 | INFO | train_inner | epoch 003:    591 / 1217 loss=2.331, nll_loss=0.099, accuracy=16.5, wps=144, ups=0.77, wpb=188.2, bsz=8, num_updates=3025, lr=5.08547e-06, gnorm=2.717, loss_scale=128, train_wall=32, gb_free=15.7, wall=4270
2023-10-27 22:28:47 | INFO | train_inner | epoch 003:    616 / 1217 loss=2.335, nll_loss=0.098, accuracy=19.5, wps=148.7, ups=0.78, wpb=191.4, bsz=8, num_updates=3050, lr=5.04274e-06, gnorm=2.744, loss_scale=128, train_wall=32, gb_free=15.7, wall=4303
2023-10-27 22:29:23 | INFO | train_inner | epoch 003:    641 / 1217 loss=2.346, nll_loss=0.096, accuracy=18, wps=139.6, ups=0.71, wpb=196.1, bsz=8, num_updates=3075, lr=5e-06, gnorm=2.69, loss_scale=128, train_wall=35, gb_free=15.2, wall=4338
2023-10-27 22:30:00 | INFO | train_inner | epoch 003:    666 / 1217 loss=2.329, nll_loss=0.098, accuracy=16.5, wps=128, ups=0.68, wpb=189.2, bsz=8, num_updates=3100, lr=4.95726e-06, gnorm=2.622, loss_scale=128, train_wall=37, gb_free=15.7, wall=4375
2023-10-27 22:30:34 | INFO | train_inner | epoch 003:    691 / 1217 loss=2.325, nll_loss=0.097, accuracy=18.5, wps=138.1, ups=0.72, wpb=191.9, bsz=8, num_updates=3125, lr=4.91453e-06, gnorm=2.56, loss_scale=128, train_wall=34, gb_free=15.7, wall=4409
2023-10-27 22:31:09 | INFO | train_inner | epoch 003:    716 / 1217 loss=2.319, nll_loss=0.095, accuracy=20.5, wps=141.8, ups=0.73, wpb=195, bsz=8, num_updates=3150, lr=4.87179e-06, gnorm=2.554, loss_scale=128, train_wall=34, gb_free=15.7, wall=4444
2023-10-27 22:31:44 | INFO | train_inner | epoch 003:    741 / 1217 loss=2.318, nll_loss=0.096, accuracy=23.5, wps=137.5, ups=0.72, wpb=192.3, bsz=8, num_updates=3175, lr=4.82906e-06, gnorm=2.571, loss_scale=128, train_wall=35, gb_free=15.1, wall=4479
2023-10-27 22:32:17 | INFO | train_inner | epoch 003:    766 / 1217 loss=2.324, nll_loss=0.097, accuracy=19, wps=141, ups=0.74, wpb=190.8, bsz=8, num_updates=3200, lr=4.78632e-06, gnorm=2.602, loss_scale=128, train_wall=33, gb_free=15.7, wall=4513
2023-10-27 22:32:52 | INFO | train_inner | epoch 003:    791 / 1217 loss=2.327, nll_loss=0.097, accuracy=19, wps=137.8, ups=0.72, wpb=191.4, bsz=8, num_updates=3225, lr=4.74359e-06, gnorm=2.581, loss_scale=128, train_wall=34, gb_free=15.7, wall=4547
2023-10-27 22:33:26 | INFO | train_inner | epoch 003:    816 / 1217 loss=2.324, nll_loss=0.096, accuracy=21, wps=142.5, ups=0.74, wpb=193.4, bsz=8, num_updates=3250, lr=4.70085e-06, gnorm=2.571, loss_scale=128, train_wall=34, gb_free=15.7, wall=4581
2023-10-27 22:34:03 | INFO | train_inner | epoch 003:    841 / 1217 loss=2.323, nll_loss=0.096, accuracy=24, wps=130.7, ups=0.68, wpb=192.6, bsz=8, num_updates=3275, lr=4.65812e-06, gnorm=2.583, loss_scale=128, train_wall=36, gb_free=15.7, wall=4618
2023-10-27 22:34:38 | INFO | train_inner | epoch 003:    866 / 1217 loss=2.332, nll_loss=0.096, accuracy=19, wps=139.2, ups=0.72, wpb=193.9, bsz=8, num_updates=3300, lr=4.61538e-06, gnorm=2.62, loss_scale=128, train_wall=34, gb_free=15.7, wall=4653
2023-10-27 22:35:13 | INFO | train_inner | epoch 003:    891 / 1217 loss=2.318, nll_loss=0.095, accuracy=20, wps=138.5, ups=0.71, wpb=194.4, bsz=8, num_updates=3325, lr=4.57265e-06, gnorm=2.57, loss_scale=128, train_wall=35, gb_free=15.7, wall=4688
2023-10-27 22:35:46 | INFO | train_inner | epoch 003:    916 / 1217 loss=2.317, nll_loss=0.097, accuracy=26.5, wps=142.9, ups=0.75, wpb=191, bsz=8, num_updates=3350, lr=4.52991e-06, gnorm=2.593, loss_scale=128, train_wall=33, gb_free=15.7, wall=4721
2023-10-27 22:36:21 | INFO | train_inner | epoch 003:    941 / 1217 loss=2.325, nll_loss=0.094, accuracy=21.5, wps=142.2, ups=0.72, wpb=197.6, bsz=8, num_updates=3375, lr=4.48718e-06, gnorm=2.569, loss_scale=128, train_wall=34, gb_free=15.7, wall=4756
2023-10-27 22:36:55 | INFO | train_inner | epoch 003:    966 / 1217 loss=2.318, nll_loss=0.096, accuracy=19, wps=142.4, ups=0.74, wpb=192.7, bsz=8, num_updates=3400, lr=4.44444e-06, gnorm=2.571, loss_scale=128, train_wall=33, gb_free=15.7, wall=4790
2023-10-27 22:37:31 | INFO | train_inner | epoch 003:    991 / 1217 loss=2.325, nll_loss=0.096, accuracy=18, wps=136.2, ups=0.7, wpb=194.5, bsz=8, num_updates=3425, lr=4.40171e-06, gnorm=2.548, loss_scale=128, train_wall=35, gb_free=14.6, wall=4826
2023-10-27 22:38:06 | INFO | train_inner | epoch 003:   1016 / 1217 loss=2.338, nll_loss=0.097, accuracy=14.5, wps=134.8, ups=0.7, wpb=193.4, bsz=8, num_updates=3450, lr=4.35897e-06, gnorm=2.494, loss_scale=128, train_wall=36, gb_free=15.7, wall=4862
2023-10-27 22:38:42 | INFO | train_inner | epoch 003:   1041 / 1217 loss=2.323, nll_loss=0.099, accuracy=22.5, wps=133.4, ups=0.71, wpb=188.2, bsz=8, num_updates=3475, lr=4.31624e-06, gnorm=2.454, loss_scale=128, train_wall=35, gb_free=15.7, wall=4897
2023-10-27 22:39:19 | INFO | train_inner | epoch 003:   1066 / 1217 loss=2.332, nll_loss=0.096, accuracy=16, wps=130.6, ups=0.67, wpb=193.8, bsz=8, num_updates=3500, lr=4.2735e-06, gnorm=2.42, loss_scale=128, train_wall=36, gb_free=15.7, wall=4934
2023-10-27 22:39:55 | INFO | train_inner | epoch 003:   1091 / 1217 loss=2.328, nll_loss=0.094, accuracy=17, wps=136.6, ups=0.69, wpb=198.8, bsz=8, num_updates=3525, lr=4.23077e-06, gnorm=2.426, loss_scale=128, train_wall=36, gb_free=15.7, wall=4970
2023-10-27 22:40:32 | INFO | train_inner | epoch 003:   1116 / 1217 loss=2.333, nll_loss=0.099, accuracy=17, wps=126.6, ups=0.67, wpb=188, bsz=8, num_updates=3550, lr=4.18803e-06, gnorm=2.411, loss_scale=128, train_wall=37, gb_free=15.7, wall=5008
2023-10-27 22:41:06 | INFO | train_inner | epoch 003:   1141 / 1217 loss=2.333, nll_loss=0.099, accuracy=20.5, wps=141, ups=0.75, wpb=188.4, bsz=8, num_updates=3575, lr=4.1453e-06, gnorm=2.39, loss_scale=128, train_wall=33, gb_free=15.7, wall=5041
2023-10-27 22:41:42 | INFO | train_inner | epoch 003:   1166 / 1217 loss=2.323, nll_loss=0.095, accuracy=20, wps=136, ups=0.7, wpb=195.3, bsz=8, num_updates=3600, lr=4.10256e-06, gnorm=2.361, loss_scale=128, train_wall=35, gb_free=15.7, wall=5077
2023-10-27 22:42:16 | INFO | train_inner | epoch 003:   1191 / 1217 loss=2.319, nll_loss=0.094, accuracy=20, wps=144.3, ups=0.73, wpb=197.9, bsz=8, num_updates=3625, lr=4.05983e-06, gnorm=2.365, loss_scale=128, train_wall=34, gb_free=15.7, wall=5111
2023-10-27 22:42:50 | INFO | train_inner | epoch 003:   1216 / 1217 loss=2.323, nll_loss=0.096, accuracy=23, wps=140.4, ups=0.73, wpb=192.7, bsz=8, num_updates=3650, lr=4.01709e-06, gnorm=2.347, loss_scale=128, train_wall=34, gb_free=15.7, wall=5145

Environment

  • fairseq Version (e.g., 1.0 or main): main
  • PyTorch Version (e.g., 1.0): 1.11.0
  • OS (e.g., Linux): Linux Ubuntu
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): pip install --editable ./
  • Python version: 3.9
  • CUDA/cuDNN version: 11.1
  • GPU models and configuration: GeForce RTX 3090
  • Any other relevant information:
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
antlr4-python3-runtime    4.8                      pypi_0    pypi
bitarray                  2.8.2                    pypi_0    pypi
blas                      1.0                         mkl    defaults
brotli-python             1.0.9            py39h6a678d5_7    defaults
bzip2                     1.0.8                h7b6447c_0    defaults
ca-certificates           2023.08.22           h06a4308_0    defaults
certifi                   2023.7.22        py39h06a4308_0    defaults
cffi                      1.15.1           py39h5eee18b_3    defaults
charset-normalizer        2.0.4              pyhd3eb1b0_0    defaults
colorama                  0.4.6                    pypi_0    pypi
cryptography              41.0.3           py39hdda0065_0    defaults
cudatoolkit               11.3.1               h2bc3f7f_2    defaults
cython                    3.0.4                    pypi_0    pypi
fairseq                   0.12.2                   pypi_0    pypi
ffmpeg                    4.3                  hf484d3e_0    pytorch
filelock                  3.12.4                   pypi_0    pypi
freetype                  2.12.1               h4a9f257_0    defaults
fsspec                    2023.10.0                pypi_0    pypi
giflib                    5.2.1                h5eee18b_3    defaults
gmp                       6.2.1                h295c915_3    defaults
gnutls                    3.6.15               he1e5248_0    defaults
hydra-core                1.0.7                    pypi_0    pypi
idna                      3.4              py39h06a4308_0    defaults
intel-openmp              2023.1.0         hdb19cb5_46305    defaults
jinja2                    3.1.2                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jpeg                      9e                   h5eee18b_1    defaults
lame                      3.100                h7b6447c_0    defaults
lcms2                     2.12                 h3be6417_0    defaults
ld_impl_linux-64          2.38                 h1181459_1    defaults
lerc                      3.0                  h295c915_0    defaults
libdeflate                1.17                 h5eee18b_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libiconv                  1.16                 h7f8727e_2    defaults
libidn2                   2.3.4                h5eee18b_0    defaults
libpng                    1.6.39               h5eee18b_0    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libtasn1                  4.19.0               h5eee18b_0    defaults
libtiff                   4.5.1                h6a678d5_0    defaults
libunistring              0.9.10               h27cfd23_0    defaults
libuv                     1.44.2               h5eee18b_0    defaults
libwebp                   1.3.2                h11a3e52_0    defaults
libwebp-base              1.3.2                h5eee18b_0    defaults
lxml                      4.9.3                    pypi_0    pypi
lz4-c                     1.9.4                h6a678d5_0    defaults
markupsafe                2.1.3                    pypi_0    pypi
mkl                       2023.1.0         h213fc3f_46343    defaults
mkl-service               2.4.0            py39h5eee18b_1    defaults
mkl_fft                   1.3.8            py39h5eee18b_0    defaults
mkl_random                1.2.4            py39hdb19cb5_0    defaults
mpmath                    1.3.0                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
nettle                    3.7.3                hbbd107a_1    defaults
networkx                  3.2                      pypi_0    pypi
numpy                     1.26.0           py39h5f9d8c6_0    defaults
numpy-base                1.26.0           py39hb5e798b_0    defaults
nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.3.52                  pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
omegaconf                 2.0.6                    pypi_0    pypi
openh264                  2.1.1                h4ff587b_0    defaults
openjpeg                  2.4.0                h3ad879b_0    defaults
openssl                   3.0.11               h7f8727e_2    defaults
packaging                 23.2                     pypi_0    pypi
pillow                    10.0.1           py39ha6cbd5a_0    defaults
pip                       23.3             py39h06a4308_0    defaults
portalocker               2.8.2                    pypi_0    pypi
pycparser                 2.21               pyhd3eb1b0_0    defaults
pyopenssl                 23.2.0           py39h06a4308_0    defaults
pysocks                   1.7.1            py39h06a4308_0    defaults
python                    3.9.18               h955ad1f_0    defaults
pytorch                   1.12.0          py3.9_cuda11.3_cudnn8.3.2_0    pytorch
pytorch-mutex             1.0                        cuda    pytorch
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
regex                     2023.10.3                pypi_0    pypi
requests                  2.31.0           py39h06a4308_0    defaults
sacrebleu                 2.3.1                    pypi_0    pypi
scikit-learn              1.3.2                    pypi_0    pypi
scipy                     1.11.3                   pypi_0    pypi
setuptools                68.0.0           py39h06a4308_0    defaults
sqlite                    3.41.2               h5eee18b_0    defaults
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
tbb                       2021.8.0             hdb19cb5_0    defaults
threadpoolctl             3.2.0                    pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
torch                     2.1.0                    pypi_0    pypi
torchaudio                0.12.0               py39_cu113    pytorch
torchvision               0.13.0               py39_cu113    pytorch
tqdm                      4.66.1                   pypi_0    pypi
triton                    2.1.0                    pypi_0    pypi
typing_extensions         4.7.1            py39h06a4308_0    defaults
tzdata                    2023c                h04d1e81_0    defaults
urllib3                   1.26.18          py39h06a4308_0    defaults
wheel                     0.41.2           py39h06a4308_0    defaults
xz                        5.4.2                h5eee18b_0    defaults
zlib                      1.2.13               h5eee18b_0    defaults
zstd                      1.5.5                hc292b87_0    defaults

Gsruhj avatar Oct 28 '23 07:10 Gsruhj

I have a similar problem with you. Have you resolved it?

Eric-ddd avatar Nov 11 '23 09:11 Eric-ddd

Following #1687, I create totally new conda environment. However, it doesn't work until I download https://github.com/VITA-Group/SMC-Bench and put it in the folder /data/username/work/username/test/SMC-Bench-main. When I move the folder to another path, the training acc goes to 20% again. I think the problem is probably due to path resolution.

Gsruhj avatar Nov 14 '23 06:11 Gsruhj