Nicolas Patry comments

Results 977 comments of


                                            Nicolas Patry

Tokenizers for Node 16?

Hi, this will require an update of `neon` which is the library we use for node bindings. Unfortunately, `neon` introduces a lot of breaking changes (for the better it seems)...

Tokenizers for Node 16?

When we redo the bindings, we can also think about manylinux support: https://github.com/huggingface/tokenizers/issues/972

AttributeError: 'BertTokenizer' object has no attribute 'tokens_trie'

Could you maybe share some code on how to reproduce the issue starting from an existing tokenizer ? Currently it's hard to understand what's going on.

Attempt to make unigram faster 2.

Nice ! Still a few lints :)

Introducing special tokens via `tokenizers.normalizers.Replace`

> as well as splitting into words". I think it would be better if all it did was replace bytes and left splitting to another pre_tokenizer step. We're in luck,...

Extend tokenizer vocabulary with new words

> I don't think .add_tokens() is implemented. https://github.com/huggingface/transformers/blob/cad61b68396a1a387287a8e2e2fef78a25b79383/src/transformers/tokenization_utils_base.py#L952 You are pointing to a base class, so yes it's not implemented. Real class: https://github.com/huggingface/transformers/blob/cad61b68396a1a387287a8e2e2fef78a25b79383/src/transformers/tokenization_utils_fast.py#L264

Nicolas Patry

Tokenizers for Node 16?

Tokenizers for Node 16?

AttributeError: 'BertTokenizer' object has no attribute 'tokens_trie'

Attempt to make unigram faster 2.

Introducing special tokens via `tokenizers.normalizers.Replace`

Extend tokenizer vocabulary with new words

feat(ci): add macos arm64 runner

feat(ci): add macos arm64 runner

feat(ci): add macos arm64 runner

feat(ci): add macos arm64 runner