Nicolas Patry

Results 51 issues of Nicolas Patry

# What does this PR do? When doing model loading with the various tools within `transformers` there's actually a lot of duplicate HEAD calls that cost network time and duplicate...

Wasm support would be a cool issue. - [ ] Add a feature flag `wasm`. - [ ] Use `esaxx_rs::suffix_rs` instead of `esaxx_rs::suffix` (Maybe with a change in the crate...

We're at this point waiting for GH actions to support M1 runners. Any help to do it differently is appreciated, but for now we will probably resolve to manual builds...

The hack has been added here: https://github.com/huggingface/tokenizers/pull/896/files#diff-43edb84c28a212d7240c7e278964efe107d223b755acb42e3ab45ac1db2bd26aR377 To conform with the behavior of `SPM`. Ideally we should find a way to move it out of `BPE` since this is very...

This fixes this while keeping backward support. We don't want to merge that blindly.

Very hacky. Do not merge.

Instead of delete + insert everything, it will replace every char first then append if things are left. Linked to #892

Seems a little unsatisfying, need to dig a little into this to check. Probably linked to more allocs (since the need for a prefix to differentiate continuing subword_prefix vs not...

Currently, there are still quite a bit of issues when information is set on the `Tokenizer` and the `trainer` is not picking up those elements leading to errors/incomprehensions: https://github.com/huggingface/tokenizers/issues/876

bug
enhancement
Feature Request

Inspiration here: https://github.com/huggingface/tokenizers/pull/921 the goal is simply to reparallelize the loops