Nicolas Patry

Results 978 comments of Nicolas Patry

Feel free to just not use those features. With 0.9.0 ```python import tokenizers tokenizer = tokenizers.Tokenizer(tokenizers.models.BPE()) trainer = tokenizers.trainers.BpeTrainer(vocab_size=8000) tokenizer.train(trainer, ['wikitext-103-raw/wiki.test.raw']) [00:00:00] Reading files (1 Mo) █████████████████████████████████████████████████████████████████████ 100 [00:00:00] Tokenize...

> I think I'd like to keep this issue open until there is a simplified version of the library available. The base classes include all sorts of functions that are...

Do you mind sharing which model you're talking about ? Currently we cannot reproduce anything so it's really hard to understand what's the problem.

It's not part of 1.0.

@chandrasekharpatra Please feel free to take a stab at it. We didn't actually do it for 1.0 because it actually didn't seem that important for finding bugs/regressions. Writing the test...

Don't hesitate to open a small PR first in order to get feedback. Smaller PRs are better and more easily reviewed therefore more easily approved.

We're already adding multi modal models, without requiring users to send `hidden_states` directly: https://github.com/huggingface/text-generation-inference/pull/842 The API may change at any time, this is work in progress.

> experimenting with different encoders and projections. This is not the purpose of TGI. We try to maintain production workloads (we actively maintain our own with it). We might add...

Nice !!! Tell me when ready for review.

Chat template is used through an openai compatibility layer, meaning the payloads do not look so simple. Does openai provide any means to do tokenization, if yes we can try...