Nicolas Patry

Results 978 comments of Nicolas Patry

Shoot I just merge my PR which is the same :) Edit: accepted yours so you'll end up in contributors ! Thanks.

Hey, GPTQ should work mostly out of the box forMPT. You just need to run the script (this should work, to the potential naming of the layers inside the mode)....

> are you planning to rollout ,GPTQ implementation for MPT-30 B No, but if you figure out the sharding logic, we are accepting PRs. I tried to provide initial guidance...

> I am trying to use https://github.com/huggingface/chat-ui with https://github.com/vllm-project/vllm/tree/main openai endpoint. > Eventhough the request goes to the vllm server, I constantly get the error in the UI: Server does...

Did you try not sharing the tokenizer among multiple threads ? (The easiest way to to load the tokenizer on each thread instead ?) There are some implemented protection, but...

Instead of loading the tokenizer before the thread fork, load it afterwards. If you use torch.Dataset for instance it means loading the tokenizer in `Dataset.__init__`, instead of passing it.

You can also disable threading in tokenizers altogether by using the env variable: `TOKENIZERS_PARALLELISM=0` before launching your program, that might help.

Any simple script to reproduce maybe ?

You're sharing the tokenizer across thread boundaries.... Move the tokenizer declaration within the `create_tokenize` and everything will work fine. I'm not familiar enough with tensorflow, but there's probably another way...