gluon-nlp icon indicating copy to clipboard operation
gluon-nlp copied to clipboard

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer

Open sxjscience opened this issue 4 years ago • 4 comments

Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

  • [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • [ ] Changes are complete (i.e. I finished coding on this PR)
  • [ ] All changes have test coverage
  • [ ] Code is well-documented

Changes

  • [ ] Feature1, tests, (and when applicable, API doc)
  • [ ] Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

sxjscience avatar Jan 18 '21 01:01 sxjscience

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235     
Impacted Files Coverage Δ
src/gluonnlp/data/batchify.py 88.72% <ø> (ø)
src/gluonnlp/layers.py 87.15% <100.00%> (+0.03%) :arrow_up:
src/gluonnlp/models/transformer.py 98.93% <100.00%> (-0.01%) :arrow_down:
conftest.py 76.31% <0.00%> (-8.69%) :arrow_down:
src/gluonnlp/data/loading.py 75.75% <0.00%> (-7.64%) :arrow_down:
src/gluonnlp/utils/lazy_imports.py 58.42% <0.00%> (-2.25%) :arrow_down:
src/gluonnlp/utils/misc.py 52.51% <0.00%> (-1.06%) :arrow_down:
src/gluonnlp/data/tokenizers/yttm.py 81.73% <0.00%> (-1.02%) :arrow_down:
src/gluonnlp/data/tokenizers/spacy.py 65.33% <0.00%> (-0.91%) :arrow_down:
src/gluonnlp/data/tokenizers/huggingface.py 71.06% <0.00%> (-0.78%) :arrow_down:
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

codecov[bot] avatar Jan 18 '21 01:01 codecov[bot]

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

yongyi-wu avatar Jan 19 '21 15:01 yongyi-wu

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html

github-actions[bot] avatar Jan 19 '21 17:01 github-actions[bot]

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

sxjscience avatar Jan 20 '21 10:01 sxjscience