Description

Fix the additional of the residual connection. The previous implementation was not correct. I'm rerunning the Transformer-Big-pre-ln experiment.

@yongyi-wu

Checklist

Essentials

[ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[ ] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage
[ ] Code is well-documented

Changes

[ ] Feature1, tests, (and when applicable, API doc)
[ ] Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

Jan 18 '21 01:01 sxjscience

Codecov Report

Merging #1488 (3c7c4c1) into master (c582b64) will increase coverage by 3.21%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1488      +/-   ##
==========================================
+ Coverage   81.98%   85.19%   +3.21%     
==========================================
  Files          52       52              
  Lines        6909     6822      -87     
==========================================
+ Hits         5664     5812     +148     
+ Misses       1245     1010     -235

Impacted Files	Coverage Δ
src/gluonnlp/data/batchify.py	`88.72% <ø> (ø)`
src/gluonnlp/layers.py	`87.15% <100.00%> (+0.03%)`	:arrow_up:
src/gluonnlp/models/transformer.py	`98.93% <100.00%> (-0.01%)`	:arrow_down:
conftest.py	`76.31% <0.00%> (-8.69%)`	:arrow_down:
src/gluonnlp/data/loading.py	`75.75% <0.00%> (-7.64%)`	:arrow_down:
src/gluonnlp/utils/lazy_imports.py	`58.42% <0.00%> (-2.25%)`	:arrow_down:
src/gluonnlp/utils/misc.py	`52.51% <0.00%> (-1.06%)`	:arrow_down:
src/gluonnlp/data/tokenizers/yttm.py	`81.73% <0.00%> (-1.02%)`	:arrow_down:
src/gluonnlp/data/tokenizers/spacy.py	`65.33% <0.00%> (-0.91%)`	:arrow_down:
src/gluonnlp/data/tokenizers/huggingface.py	`71.06% <0.00%> (-0.78%)`	:arrow_down:
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update c582b64...3c7c4c1. Read the comment docs.

Jan 18 '21 01:01 codecov[bot]

Looks good— it seems all issues related to pre-norm and skip connection have been fixed.

Jan 19 '21 15:01 yongyi-wu

The documentation website for preview: http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR1488/fix_pre_ln/index.html

Jan 19 '21 17:01 github-actions[bot]

I noticed that the performance becomes worse after I changed the implementation. Still investigating the issue.

Jan 20 '21 10:01 sxjscience

gluon-nlp
gluon-nlp copied to clipboard

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer

Description

Checklist

Essentials

Changes

Comments

Codecov Report

gluon-nlp gluon-nlp copied to clipboard

[Bug][Fix][WIP] Fix pre-layernormalization in Transformer

Description

Checklist

Essentials

Changes

Comments

Codecov Report

gluon-nlp
gluon-nlp copied to clipboard