Description

add whole word mask for chinese(for BertTokenizer only)

Checklist

Essentials

[x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
[x] Changes are complete (i.e. I finished coding on this PR)
[ ] All changes have test coverage
[x] Code is well-documented

Changes

Comments

Jun 27 '19 03:06 paperplanet

Codecov Report

:exclamation: No coverage uploaded for pull request head (feature/chs-whole-word-mask@3e7468d). Click here to learn what that means. The diff coverage is n/a.

Jun 27 '19 03:06 codecov[bot]

Codecov Report

Merging #798 into master will increase coverage by 0.28%. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #798      +/-   ##
==========================================
+ Coverage   88.23%   88.52%   +0.28%     
==========================================
  Files          73       73              
  Lines        6980     6980              
==========================================
+ Hits         6159     6179      +20     
+ Misses        821      801      -20

Impacted Files	Coverage Δ
src/gluonnlp/data/word_embedding_evaluation.py	`89.31% <0.00%> (-7.64%)`	:arrow_down:
src/gluonnlp/data/glue.py	`96.81% <0.00%> (-1.82%)`	:arrow_down:
src/gluonnlp/model/attention_cell.py	`91.06% <0.00%> (+0.55%)`	:arrow_up:
src/gluonnlp/model/bert.py	`94.62% <0.00%> (+2.98%)`	:arrow_up:
src/gluonnlp/model/transformer.py	`91.66% <0.00%> (+4.80%)`	:arrow_up:
src/gluonnlp/model/utils.py	`80.00% <0.00%> (+6.92%)`	:arrow_up:
src/gluonnlp/model/seq2seq_encoder_decoder.py	`80.00% <0.00%> (+30.00%)`	:arrow_up:

Jun 27 '19 03:06 codecov[bot]

Job PR-798/1 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/1/index.html

Jun 27 '19 04:06 mli

Job PR-798/3 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/3/index.html

Jun 28 '19 02:06 mli

Job PR-798/4 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/4/index.html

Jul 01 '19 16:07 mli

Job PR-798/5 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/5/index.html

Jul 01 '19 17:07 mli

Job PR-798/6 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/6/index.html

Jul 02 '19 20:07 mli

Job PR-798/8 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/8/index.html

Jul 07 '19 19:07 mli

Job PR-798/11 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/11/index.html

Aug 15 '19 08:08 mli

Job PR-798/12 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/12/index.html

Aug 19 '19 11:08 mli

@paperplanet could you resolve the conflicts? @eric-haibin-lin any further comments?

Jan 15 '20 15:01 leezu

Job PR-798/13 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/13/index.html

Feb 20 '20 14:02 mli

Job PR-798/14 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/14/index.html

Feb 20 '20 21:02 mli

Job PR-798/15 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/15/index.html

Feb 20 '20 21:02 mli

Job PR-798/16 is complete. Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-798/16/index.html

Feb 21 '20 04:02 mli

Sorry for the late reply. I think I have resolved the conflicts. There has been some procedure change need to be reviewed: Chinese tokenization has to be done before str token becomes token id. And, cn_whole_word_mask is designed not to be turned on with whole_word_mask in the same time.

Feb 21 '20 07:02 paperplanet

gluon-nlp
gluon-nlp copied to clipboard

[Enhancement] add whole word mask for chinese

Description

Checklist

Essentials

Changes

Comments

Codecov Report

Codecov Report

gluon-nlp gluon-nlp copied to clipboard

[Enhancement] add whole word mask for chinese

Description

Checklist

Essentials

Changes

Comments

Codecov Report

Codecov Report

gluon-nlp
gluon-nlp copied to clipboard