tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

Fix decode

Open ArthurZucker opened this issue 8 months ago • 1 comments

This revert the previous breaking change.

Also add a new ByteLevel normalizer, which replaces the ByteLevel pre_tokenizer. Checked that we can add chines / Cyrillic tokens which are properly encoded and decoder.

Fixes #1392

ArthurZucker avatar Jun 18 '24 06:06 ArthurZucker