fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Wav2Vec 2 pretraining bug

Open jubick1337 opened this issue 3 years ago • 4 comments

🐛 Bug

Loss goes to very low values and accuracy is 1 after several updates. I'm sure it's bug and this metrics are wrong.

To Reproduce

  1. Get any considerable amount of wavs (2k hours in my case)
  2. Split data using manifest (--valid-percent set to 0.05)
  3. Start pretraining with default wav2vec2_large_librivox config

Logs: [2021-07-06 06:58:09,462][train_inner][INFO] - {"epoch": 1, "update": 0.002, "loss": "6.503", "ntokens": "1237.21", "nsentences": "12.44", "prob_perplexity": "107.961", "code_perplexity": "105.203", "temp": "1.999", "loss_0": "6.383", "loss_1": "0.12", "accuracy": "0.07339", "wps": "4161.8", "ups": "3.36", "wpb": "1237.2", "bsz": "12.4", "num_updates": "200", "lr": "3.125e-05", "gnorm": "3.672", "loss_scale": "64", "train_wall": "62", "gb_free": "7.6", "wall": "72"} [2021-07-06 06:59:09,734][train_inner][INFO] - {"epoch": 1, "update": 0.003, "loss": "5.939", "ntokens": "1199.67", "nsentences": "12.82", "prob_perplexity": "39.035", "code_perplexity": "38.185", "temp": "1.997", "loss_0": "5.804", "loss_1": "0.135", "accuracy": "0.2277", "wps": "3980.9", "ups": "3.32", "wpb": "1199.7", "bsz": "12.8", "num_updates": "400", "lr": "6.25e-05", "gnorm": "4.445", "loss_scale": "64", "train_wall": "59", "gb_free": "12.5", "wall": "132"} [2021-07-06 06:59:27,229][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 [2021-07-06 07:00:11,138][train_inner][INFO] - {"epoch": 1, "update": 0.005, "loss": "3.243", "ntokens": "1225.73", "nsentences": "12.245", "prob_perplexity": "3.89", "code_perplexity": "3.88", "temp": "1.995", "loss_0": "3.1", "loss_1": "0.143", "accuracy": "0.74319", "wps": "3992.4", "ups": "3.26", "wpb": "1225.7", "bsz": "12.2", "num_updates": "600", "lr": "9.375e-05", "gnorm": "5.04", "loss_scale": "32", "train_wall": "60", "gb_free": "12.9", "wall": "193"} [2021-07-06 07:01:11,700][train_inner][INFO] - {"epoch": 1, "update": 0.006, "loss": "0.683", "ntokens": "1235.98", "nsentences": "12.32", "prob_perplexity": "2.294", "code_perplexity": "2.295", "temp": "1.993", "loss_0": "0.539", "loss_1": "0.144", "accuracy": "0.95837", "wps": "4081.8", "ups": "3.3", "wpb": "1236", "bsz": "12.3", "num_updates": "800", "lr": "0.000125", "gnorm": "1.244", "loss_scale": "32", "train_wall": "59", "gb_free": "11.1", "wall": "254"} b[2021-07-06 07:02:11,323][train_inner][INFO] - {"epoch": 1, "update": 0.008, "loss": "0.144", "ntokens": "1205.85", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.991", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4045", "ups": "3.35", "wpb": "1205.8", "bsz": "12.4", "num_updates": "1000", "lr": "0.00015625", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "11.8", "wall": "313"} [2021-07-06 07:03:10,747][train_inner][INFO] - {"epoch": 1, "update": 0.009, "loss": "0.144", "ntokens": "1205.47", "nsentences": "12.555", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.989", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4057.2", "ups": "3.37", "wpb": "1205.5", "bsz": "12.6", "num_updates": "1200", "lr": "0.0001875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "11.3", "wall": "373"} [2021-07-06 07:04:09,589][train_inner][INFO] - {"epoch": 1, "update": 0.011, "loss": "0.144", "ntokens": "1174.7", "nsentences": "11.985", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.987", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "3992.8", "ups": "3.4", "wpb": "1174.7", "bsz": "12", "num_updates": "1400", "lr": "0.00021875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "13.3", "wall": "432"} [2021-07-06 07:05:10,021][train_inner][INFO] - {"epoch": 1, "update": 0.012, "loss": "0.144", "ntokens": "1230.74", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.985", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4073.2", "ups": "3.31", "wpb": "1230.7", "bsz": "12.4", "num_updates": "1600", "lr": "0.00025", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "12.3", "wall": "492"}

Expected behavior

More smooth descend?

Environment

  • fairseq Version (e.g., 1.0 or master): 0794f9a
  • PyTorch Version (e.g., 1.0) 1.9.0a0+df837d0
  • OS (e.g., Linux): Ubuntu 20.04 LTS
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): python set up.py build_ext --inplace
  • Python version: 3.8.8
  • CUDA/cuDNN version: 11.2
  • GPU models and configuration: RTX 3090
  • Any other relevant information:

Additional context

Changing LR solves this issue but what if one wants to use exact parameters from paper?

jubick1337 avatar Jul 06 '21 07:07 jubick1337