fairseq
fairseq copied to clipboard
Wav2Vec 2 pretraining bug
🐛 Bug
Loss goes to very low values and accuracy is 1 after several updates. I'm sure it's bug and this metrics are wrong.
To Reproduce
- Get any considerable amount of wavs (2k hours in my case)
- Split data using manifest (--valid-percent set to 0.05)
- Start pretraining with default wav2vec2_large_librivox config
Logs: [2021-07-06 06:58:09,462][train_inner][INFO] - {"epoch": 1, "update": 0.002, "loss": "6.503", "ntokens": "1237.21", "nsentences": "12.44", "prob_perplexity": "107.961", "code_perplexity": "105.203", "temp": "1.999", "loss_0": "6.383", "loss_1": "0.12", "accuracy": "0.07339", "wps": "4161.8", "ups": "3.36", "wpb": "1237.2", "bsz": "12.4", "num_updates": "200", "lr": "3.125e-05", "gnorm": "3.672", "loss_scale": "64", "train_wall": "62", "gb_free": "7.6", "wall": "72"} [2021-07-06 06:59:09,734][train_inner][INFO] - {"epoch": 1, "update": 0.003, "loss": "5.939", "ntokens": "1199.67", "nsentences": "12.82", "prob_perplexity": "39.035", "code_perplexity": "38.185", "temp": "1.997", "loss_0": "5.804", "loss_1": "0.135", "accuracy": "0.2277", "wps": "3980.9", "ups": "3.32", "wpb": "1199.7", "bsz": "12.8", "num_updates": "400", "lr": "6.25e-05", "gnorm": "4.445", "loss_scale": "64", "train_wall": "59", "gb_free": "12.5", "wall": "132"} [2021-07-06 06:59:27,229][fairseq.trainer][INFO] - NOTE: gradient overflow detected, ignoring gradient, setting loss scale to: 32.0 [2021-07-06 07:00:11,138][train_inner][INFO] - {"epoch": 1, "update": 0.005, "loss": "3.243", "ntokens": "1225.73", "nsentences": "12.245", "prob_perplexity": "3.89", "code_perplexity": "3.88", "temp": "1.995", "loss_0": "3.1", "loss_1": "0.143", "accuracy": "0.74319", "wps": "3992.4", "ups": "3.26", "wpb": "1225.7", "bsz": "12.2", "num_updates": "600", "lr": "9.375e-05", "gnorm": "5.04", "loss_scale": "32", "train_wall": "60", "gb_free": "12.9", "wall": "193"} [2021-07-06 07:01:11,700][train_inner][INFO] - {"epoch": 1, "update": 0.006, "loss": "0.683", "ntokens": "1235.98", "nsentences": "12.32", "prob_perplexity": "2.294", "code_perplexity": "2.295", "temp": "1.993", "loss_0": "0.539", "loss_1": "0.144", "accuracy": "0.95837", "wps": "4081.8", "ups": "3.3", "wpb": "1236", "bsz": "12.3", "num_updates": "800", "lr": "0.000125", "gnorm": "1.244", "loss_scale": "32", "train_wall": "59", "gb_free": "11.1", "wall": "254"} b[2021-07-06 07:02:11,323][train_inner][INFO] - {"epoch": 1, "update": 0.008, "loss": "0.144", "ntokens": "1205.85", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.991", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4045", "ups": "3.35", "wpb": "1205.8", "bsz": "12.4", "num_updates": "1000", "lr": "0.00015625", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "11.8", "wall": "313"} [2021-07-06 07:03:10,747][train_inner][INFO] - {"epoch": 1, "update": 0.009, "loss": "0.144", "ntokens": "1205.47", "nsentences": "12.555", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.989", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4057.2", "ups": "3.37", "wpb": "1205.5", "bsz": "12.6", "num_updates": "1200", "lr": "0.0001875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "11.3", "wall": "373"} [2021-07-06 07:04:09,589][train_inner][INFO] - {"epoch": 1, "update": 0.011, "loss": "0.144", "ntokens": "1174.7", "nsentences": "11.985", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.987", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "3992.8", "ups": "3.4", "wpb": "1174.7", "bsz": "12", "num_updates": "1400", "lr": "0.00021875", "gnorm": "0", "loss_scale": "32", "train_wall": "58", "gb_free": "13.3", "wall": "432"} [2021-07-06 07:05:10,021][train_inner][INFO] - {"epoch": 1, "update": 0.012, "loss": "0.144", "ntokens": "1230.74", "nsentences": "12.43", "prob_perplexity": "2", "code_perplexity": "2", "temp": "1.985", "loss_0": "0", "loss_1": "0.144", "accuracy": "1", "wps": "4073.2", "ups": "3.31", "wpb": "1230.7", "bsz": "12.4", "num_updates": "1600", "lr": "0.00025", "gnorm": "0", "loss_scale": "32", "train_wall": "59", "gb_free": "12.3", "wall": "492"}
Expected behavior
More smooth descend?
Environment
- fairseq Version (e.g., 1.0 or master): 0794f9a
- PyTorch Version (e.g., 1.0) 1.9.0a0+df837d0
- OS (e.g., Linux): Ubuntu 20.04 LTS
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source): python set up.py build_ext --inplace
- Python version: 3.8.8
- CUDA/cuDNN version: 11.2
- GPU models and configuration: RTX 3090
- Any other relevant information:
Additional context
Changing LR solves this issue but what if one wants to use exact parameters from paper?