Nicolas Patry

Results 978 comments of Nicolas Patry

Hmm Docker is more tight on RAM and is leading to OOM, probably need to fix the scheduling before merging then.

Can you try downloading weights directly ? ``` text-generation-server download-weights $model ``` Maybe providing the model id if it's public ? It could be a model specific issue.

Is it possible that the ones that are failing are private ? If yes you need to set `HUGGING_FACE_HUB_TOKEN`. ``` docker run -e HUGGING_FACE_HUB_TOKEN=.. ...rest ```

Ok I will close this since the issue seems solved, we can keep adding comments if other potential solutions are found.

> The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address) This seems like the issue. Could it be that something is already running...

> Please note that it is advised to avoid using the Hugging Face fast tokenizer for now, as we’ve observed that the auto-converted fast tokenizer sometimes gives incorrect tokenizations. Very...

> Is there any downside in giving the end-user the choice of choosing a slow tokenizer? Yes the router cannot detect the number of tokens within queries, which disables a...

> @Narsil what exactly has been tested? [@Narsil what exactly has been tested?](https://github.com/huggingface/transformers/blob/main/tests/models/llama/test_tokenization_llama.py#L461-L492) I ran with other datasets too, XNLI is the hardest for tokenization because of UTF-8 issues. There...

> is there any chance you could add support for the quantize_config.json file? This is actually much cleaner than the ENV variables I added. I'm more than happy to switch...

> I would really love to stop using that old GPTQ-for-LLaMa code and will do as soon as I've confirmed there's no need to do so any more. You mean...