gluon-nlp
gluon-nlp copied to clipboard
[Bug][Fix][WIP] Fix pre-layernormalization in Transformer
Description
Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.
@yongyi-wu
Checklist
Essentials
- [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [ ] All changes have test coverage
- [ ] Code is well-documented
Changes
- [ ] Feature1, tests, (and when applicable, API doc)
- [ ] Feature2, tests, (and when applicable, API doc)
Comments
- If this change is a backward incompatible change, why must this change be made.
- Interesting edge cases to note here
cc @dmlc/gluon-nlp-team
Codecov Report
Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by
3.21%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #1488 +/- ##
==========================================
+ Coverage 81.98% 85.19% +3.21%
==========================================
Files 52 52
Lines 6909 6822 -87
==========================================
+ Hits 5664 5812 +148
+ Misses 1245 1010 -235
Impacted Files | Coverage Δ | |
---|---|---|
src/gluonnlp/data/batchify.py | 88.72% <ø> (ø) |
|
src/gluonnlp/layers.py | 87.15% <100.00%> (+0.03%) |
:arrow_up: |
src/gluonnlp/models/transformer.py | 98.93% <100.00%> (-0.01%) |
:arrow_down: |
conftest.py | 76.31% <0.00%> (-8.69%) |
:arrow_down: |
src/gluonnlp/data/loading.py | 75.75% <0.00%> (-7.64%) |
:arrow_down: |
src/gluonnlp/utils/lazy_imports.py | 58.42% <0.00%> (-2.25%) |
:arrow_down: |
src/gluonnlp/utils/misc.py | 52.51% <0.00%> (-1.06%) |
:arrow_down: |
src/gluonnlp/data/tokenizers/yttm.py | 81.73% <0.00%> (-1.02%) |
:arrow_down: |
src/gluonnlp/data/tokenizers/spacy.py | 65.33% <0.00%> (-0.91%) |
:arrow_down: |
src/gluonnlp/data/tokenizers/huggingface.py | 71.06% <0.00%> (-0.78%) |
:arrow_down: |
... and 22 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.
Looks good— it seems all issues related to pre-norm and skip connection have been fixed.
The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html
I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.