PaddleNLP icon indicating copy to clipboard operation
PaddleNLP copied to clipboard

[Tokenizer]Convert fast_tokenizer to hf-tokenizers

Open Southpika opened this issue 1 year ago • 3 comments

迁移应用hf-tokenizers作为fast-tokenizer

除了tokenizer相关添加以外修复了一些小问题

  1. name_or_path属性没有正确赋值
  2. 修复from_slow参数没有被使用的问题

注:目前ERNIE-M slow与fast版本的结果存在diff,待确定使用版本

Southpika avatar Feb 06 '24 06:02 Southpika

Thanks for your contribution!

paddle-bot[bot] avatar Feb 06 '24 06:02 paddle-bot[bot]

Codecov Report

Attention: Patch coverage is 82.29167% with 119 lines in your changes missing coverage. Please review.

Project coverage is 55.40%. Comparing base (bc91dc6) to head (eeb1c5c). Report is 170 commits behind head on develop.

:exclamation: Current head eeb1c5c differs from pull request most recent head 66575bb

Please upload reports for the commit 66575bb to get more accurate results.

Files Patch % Lines
paddlenlp/transformers/convert_slow_tokenizer.py 82.33% 50 Missing :warning:
paddlenlp/transformers/gemma/fast_tokenizer.py 76.71% 17 Missing :warning:
paddlenlp/transformers/roberta/fast_tokenizer.py 73.21% 15 Missing :warning:
paddlenlp/transformers/tokenizer_utils_base.py 59.37% 13 Missing :warning:
paddlenlp/transformers/llama/fast_tokenizer.py 76.00% 12 Missing :warning:
paddlenlp/transformers/albert/fast_tokenizer.py 73.52% 9 Missing :warning:
paddlenlp/transformers/auto/tokenizer.py 75.00% 1 Missing :warning:
...addlenlp/transformers/chatglm_v2/fast_tokenizer.py 96.00% 1 Missing :warning:
paddlenlp/transformers/tokenizer_utils_fast.py 90.90% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7974      +/-   ##
===========================================
+ Coverage    55.25%   55.40%   +0.14%     
===========================================
  Files          613      621       +8     
  Lines        95625    96168     +543     
===========================================
+ Hits         52837    53280     +443     
- Misses       42788    42888     +100     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 26 '24 06:02 codecov[bot]

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

github-actions[bot] avatar Jun 29 '24 00:06 github-actions[bot]