PyTorch-NLP icon indicating copy to clipboard operation
PyTorch-NLP copied to clipboard

Add BPE encoder

Open Columbine21 opened this issue 5 years ago • 2 comments

add the bytepair encoding #7

Columbine21 avatar May 14 '20 02:05 Columbine21

Codecov Report

Merging #100 into master will decrease coverage by 0.10%. The diff coverage is 92.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
- Coverage   94.41%   94.31%   -0.11%     
==========================================
  Files          64       66       +2     
  Lines        1611     1705      +94     
==========================================
+ Hits         1521     1608      +87     
- Misses         90       97       +7     
Impacted Files Coverage Δ
torchnlp/encoders/text/bpe_text_tokenizer.py 90.16% <90.16%> (ø)
torchnlp/encoders/text/bytepair_encoder.py 96.87% <96.87%> (ø)
torchnlp/encoders/text/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update cde86ba...63460d0. Read the comment docs.

codecov-commenter avatar Jul 01 '20 05:07 codecov-commenter

Hey! Thank you for your contribution.

Do you an opinion on subword_nmt vs tokenizers by HuggingFace?

PetrochukM avatar Jul 04 '20 01:07 PetrochukM