Arthur
Arthur
Pretty sure this was fixed
you need rust to compile this from source!
SOrry for the late reply here, I think pyo3 is required for this TBH. We have example of this, we use wrappers and match types to switch between `model` or...
Hey sorry for the delay I'll try to review for next week!
Hey! THis is most probably unrelated to `tokenizers`. Here is a good explanation: https://nlp.stanford.edu/~johnhew/vocab-expansion.html
This probably just the version of python that was not supported!
It's high in my priority to do benchmarks and improve our code if needed!
You are using `GPT2Tokenizer` which is the slow one. Use GPT2TokenizerFast 😅
We actually dived a bit: 1. Rayon parallelism is kinda broken 2. we have concurency on the cache for GPT2 3. We have memory allocation that are also slowing down...
One thing tho, is that tiktoken forces the spilt of very long sequences. If you split them in batch you are already gonna have quite a lot better perfs