gluon-nlp icon indicating copy to clipboard operation
gluon-nlp copied to clipboard

Enable bias correction in AdamW when fine-tuning BERT

Open leezu opened this issue 3 years ago • 9 comments

This should improve stability.

Mosbach, Marius, Maksym Andriushchenko, and Dietrich Klakow. "On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines." arXiv preprint arXiv:2006.04884 (2020).

Zhang, Tianyi, et al. "Revisiting Few-sample BERT Fine-tuning." arXiv preprint arXiv:2006.05987 (2020)

leezu avatar Jan 07 '21 23:01 leezu

Let's try to rerun the training with the batch script here: https://github.com/dmlc/gluon-nlp/tree/master/tools/batch#squad-training

Basically, we just need to run the following two for SQuAD 2.0 and 1.1

# AWS Batch training with horovod on SQuAD 2.0 + FP16
bash question_answering/run_batch_squad.sh 1 2.0 submit_squad_v2_horovod_fp16.log float16

# AWS Batch training with horovod on SQuAD 1.1 + FP16
bash question_answering/run_batch_squad.sh 1 1.1 submit_squad_v1_horovod_fp16.log float16

sxjscience avatar Jan 07 '21 23:01 sxjscience

Codecov Report

Merging #1468 (52ce2a4) into master (def0d70) will decrease coverage by 0.01%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1468      +/-   ##
==========================================
- Coverage   85.86%   85.84%   -0.02%     
==========================================
  Files          52       52              
  Lines        6911     6911              
==========================================
- Hits         5934     5933       -1     
- Misses        977      978       +1     
Impacted Files Coverage Δ
src/gluonnlp/data/tokenizers/yttm.py 81.89% <0.00%> (-0.87%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update def0d70...52ce2a4. Read the comment docs.

codecov[bot] avatar Jan 07 '21 23:01 codecov[bot]

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1468/bertbiascorrection/index.html

github-actions[bot] avatar Jan 07 '21 23:01 github-actions[bot]

test_squad2_albert_base 8903644b-13e1-4aa4-b695-e7b5f2c50c7d
test_squad2_albert_large aac428ac-4e25-48e8-8f3e-2643cbb6b95e
test_squad2_albert_xlarge bb565663-8173-45aa-9489-2dd690fd24c4
test_squad2_albert_xxlarge 38d9929c-fea2-4648-bc68-0bd4eb491ee8
test_squad2_electra_base 0eb9090a-d86b-40a6-9f1c-61e1cf034b59
test_squad2_electra_large 43fabf48-b524-499f-9d8a-2113349dcf74
test_squad2_electra_small 5c631945-ad26-4c2f-a7d3-bb8c705023a2
test_squad2_roberta_large 96d1e46f-b292-4915-a867-c724bb082585
test_squad2_uncased_bert_base 8228dd4c-27d3-4118-b682-06332db980f2
test_squad2_uncased_bert_large 22a91f7c-707e-4adf-a3d9-71286a3e165e
test_squad2_gluon_en_cased_bert_base_v1 13d38ddd-4ab6-4e60-8cae-1400d3169d4c
test_squad2_mobilebert 5377ebdc-da03-4e4e-8546-43e83643d1c0
test_squad2_albert_base c71abbd1-9ddb-465a-83a8-a257994a47a4
test_squad2_albert_large 55a10c2f-b51e-4722-b8fe-d0154ccf1124
test_squad2_albert_xlarge d3b1e954-b22e-4b30-bc3a-db3303d8de85
test_squad2_albert_xxlarge 9d8c599c-ecf2-4815-ac3c-cc853c75cddd
test_squad2_electra_base 9c10fca5-0ac6-4ec8-91ce-ebf2e0593513
test_squad2_electra_large d844645c-d56b-4549-805e-a3558d777e75
test_squad2_electra_small 8b17bb3f-ee8e-4212-92d7-59155f0c54ef
test_squad2_roberta_large e9972888-ae53-41e0-9b8f-1db8359e68c9
test_squad2_uncased_bert_base 083c431c-6e02-4a67-ab92-1e84a450df52
test_squad2_uncased_bert_large 24d40d9e-06fd-4158-90a3-1ee5da7183c1
test_squad2_gluon_en_cased_bert_base_v1 6b2c015b-5829-40b6-9435-718d3ecf46de
test_squad2_mobilebert 08e7618c-7e19-4db2-9451-09f65729272e

leezu avatar Jan 08 '21 00:01 leezu

Yes, you can later use the following script to sync up the results.

bash question_answering/sync_batch_result.sh submit_squad_v2_horovod_fp16.log squad_v2_horovod_fp16
bash question_answering/sync_batch_result.sh submit_squad_v1_horovod_fp16.log squad_v1_horovod_fp16

After all results (part of the results) have been finished, you can parse the logs via

python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp32

sxjscience avatar Jan 08 '21 00:01 sxjscience

% python3 question_answering/parse_squad_results.py --dir squad_v2_horovod_fp16                                                                  1m 37s ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  81.861255  79.112272       -1.671970       -1.742718             1.139900
1                  albert_large  84.904438  81.900109       -1.086745       -1.086745             3.423180
2                 albert_xlarge  88.032327  85.134338       -1.625434       -1.625434             5.967083
3                albert_xxlarge  90.085053  87.155731       -2.226489       -2.226489            11.294118
4                  electra_base  86.282903  83.643561       -1.848169       -2.301743             1.250153
5                 electra_large  90.871907  88.461215       -1.347744       -1.347744             3.140608
6                 electra_small  73.878219  71.481513       -1.548537       -1.548537             0.383728
7   gluon_en_cased_bert_base_v1  77.620289  74.757854       -1.731051       -1.731051             1.595762
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  89.239196  86.431399       -2.168329       -2.168329             4.119268
10            uncased_bert_base  75.539014  72.702771       -1.595349       -1.850638             1.540320
11           uncased_bert_large  81.322878  78.177377       -2.056313       -2.056739             4.103469
Saving to squad_v2_horovod_fp16.csv

% python3 question_answering/parse_squad_results.py --dir squad_v1_horovod_fp16                                                                         ~/src/gluon-nlp/tools/batch master ip-10-20-11-150
                           name    best_f1    best_em  best_f1_thresh  best_em_thresh  time_spent_in_hours
0                   albert_base  90.605130  83.964049             NaN             NaN             0.745851
1                  albert_large  92.574139  86.385998             NaN             NaN             2.319241
2                 albert_xlarge  93.836504  87.984863             NaN             NaN             4.367765
3                albert_xxlarge  94.569074  88.448439             NaN             NaN             7.321531
4                  electra_base  92.483534  86.821192             NaN             NaN             0.882092
5                 electra_large  94.824761  89.631031             NaN             NaN             2.216832
6                 electra_small  85.263124  78.893094             NaN             NaN             0.267190
7   gluon_en_cased_bert_base_v1  88.685434  81.986755             NaN             NaN             1.077892
8                    mobilebert        NaN        NaN             NaN             NaN                  NaN
9                 roberta_large  94.665818  89.101230             NaN             NaN             2.790591
10            uncased_bert_base  88.103126  81.201514             NaN             NaN             0.979201
11           uncased_bert_large  90.691656  83.945128             NaN             NaN             2.756076
Saving to squad_v1_horovod_fp16.csv

Is there any known issue with Mobilebert? I

leezu avatar Jan 08 '21 14:01 leezu

Looks like an AMP issue or an operator issue causing AMP to continue decreasing the scale.. finetune_squad2.0.log

leezu avatar Jan 08 '21 14:01 leezu

Yes.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Leonard Lausen [email protected] Sent: Friday, January 8, 2021 6:11:53 AM To: dmlc/gluon-nlp [email protected] Cc: Xingjian SHI [email protected]; Review requested [email protected] Subject: Re: [dmlc/gluon-nlp] Enable bias correction in AdamW when fine-tuning BERT (#1468)

Looks like an AMP issue? finetune_squad2.0.loghttps://github.com/dmlc/gluon-nlp/files/5787622/finetune_squad2.0.log

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHubhttps://github.com/dmlc/gluon-nlp/pull/1468#issuecomment-756775027, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABHQH3SCBUACNO3HK254UOTSY4HCTANCNFSM4VZVBFMA.

sxjscience avatar Jan 08 '21 15:01 sxjscience

From the figure, I think the performance looks similar. If we choose to update the flags, we can upload the pretrained weights to S3 and also change the numbers in https://github.com/dmlc/gluon-nlp/tree/master/scripts/question_answering.

sxjscience avatar Jan 08 '21 16:01 sxjscience