Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

wrong architecture `tokenizers.cpython-39-darwin.so` (x86_64) when installing on apple silicon (arm64)

You're running a too old Python version (or too new). Thatś the only reason for needing to build from source, everything else should be prebuilt.

Implement the Byte->char hack of SPM within BPE

Post processors AND decoders are now sequential, so it would definitely be doable right now ! I'll try to tackle it not too far into the future.

Implement the Byte->char hack of SPM within BPE

https://github.com/huggingface/tokenizers/pull/1183

Error: ThreadPoolBuildError

@sarrahbbh Please re-open an issue with the appropriate details to reproduce it

feat: accept list as prompt and use first string

Should be good after rebase.

feat: accept list as prompt and use first string

> > ... The logs are rather poor compared to the regular endpoints. > > ``` > > 2024-04-16T10:42:49.931556Z INFO text_generation_router::server: router/src/server.rs:500: Success > > ``` > > > >...

Use the HuggingFace llama Tokenizer

Hey you're trying to convert the model. There are other scripts for the tokenizer. I haven't finished it yet (just requires more testing). For dependencies you can use no-default -...

Use the HuggingFace llama Tokenizer

the tokenizer is ready here: https://huggingface.co/hf-internal-testing/tiny-random-llama/tree/main But it does require `tokenizers@main` and is not released yet. Will try to do a release next week (there's still a few needed updates...

Use the HuggingFace llama Tokenizer

tokenizers=0.13.3 is released and can be used. The tokenizer is here https://huggingface.co/hf-internal-testing/llama-tokenizer (tokenizer.json). ```rust let tokenizer = Tokenizer::from_file("tokenizer.json").unwrap(); let encoded = tokenizer.encode("This is a test"); # None is the optional...

Use the HuggingFace llama Tokenizer

> We are considering a potential integration of BLOOM and RWKV in the future. Would it be possible to use this library to tokenize input for those models? Bloom is...