Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation.

Shoot I just merge my PR which is the same :) Edit: accepted yours so you'll end up in contributors ! Thanks.

Custom model: RuntimeError: weight shared.weight does not exist

Thanks for sharing your solution !

GPTQ quantization for MPT-30 models

Hey, GPTQ should work mostly out of the box forMPT. You just need to run the script (this should work, to the potential naming of the layers inside the mode)....

GPTQ quantization for MPT-30 models

> are you planning to rollout ,GPTQ implementation for MPT-30 B No, but if you figure out the sharding logic, we are accepting PRs. I tried to provide initial guidance...

`stream` is not supported for this model

> I am trying to use https://github.com/huggingface/chat-ui with https://github.com/vllm-project/vllm/tree/main openai endpoint. > Eventhough the request goes to the vllm server, I constantly get the error in the UI: Server does...

RuntimeError: Already borrowed

Did you try not sharing the tokenizer among multiple threads ? (The easiest way to to load the tokenizer on each thread instead ?) There are some implemented protection, but...

RuntimeError: Already borrowed

Instead of loading the tokenizer before the thread fork, load it afterwards. If you use torch.Dataset for instance it means loading the tokenizer in `Dataset.__init__`, instead of passing it.

RuntimeError: Already borrowed

You can also disable threading in tokenizers altogether by using the env variable: `TOKENIZERS_PARALLELISM=0` before launching your program, that might help.

RuntimeError: Already borrowed

Any simple script to reproduce maybe ?

RuntimeError: Already borrowed

You're sharing the tokenizer across thread boundaries.... Move the tokenizer declaration within the `create_tokenize` and everything will work fine. I'm not familiar enough with tensorflow, but there's probably another way...