Nicolas Patry
Nicolas Patry
Cannot reproduce on our end. Can you reproduce with the docker image ? Environment and dependency can impact what's happening. Also are you all running on main ?
I still cannot reproduce. Can you try upgrading to 1.4.5 latest version ? Also, the error occurs in causal LM which is not supposed to happen, this model should be...
I'm not sure we want to be 100% `spm` compliant on the training side. @n1t0 ? One goal of this library is to be as modular as possible, so taking...
Many things here: Overall I think in order to do any changes, we would need some kind of benchmark to judge overall quality of the end tokenization on various datasets....
Try disabling flash attention, A800 are not supported by it I think. `USE_FLASH_ATTENTION=false` in your env should do.
Ah, can you try maybe any other model see if it's maybe GPTQ + triton that doesn't work on A800 ? (Don't have acces rn to reproduce)
> tiktoken is supposed to be much faster than tokenizers for BPE tokenizers. Proof please. Also proof that the difference in speed is actually relevant in real world use cases....
> performance What kind ? PPL, yes, but usually it's acceptable. Latency ? No it doesn't in our prod. It actually helps quite a lot because there's a lot more...
Will get superseeded by :https://github.com/huggingface/text-generation-inference/pull/438
This is very cool ! Definitely a good target for audio-to-audio as a starter (no widget needed). `audio-segmentation` seems like a good fit for what you're trying to do (does...