Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Adding Llava-Next (Llava 1.6) with full support.

Hmm Docker is more tight on RAM and is leading to OOM, probably need to fix the scheduling before merging then.

Downloading stuck for some models

Can you try downloading weights directly ? ``` text-generation-server download-weights $model ``` Maybe providing the model id if it's public ? It could be a model specific issue.

Downloading stuck for some models

Is it possible that the ones that are failing are private ? If yes you need to set `HUGGING_FACE_HUB_TOKEN`. ``` docker run -e HUGGING_FACE_HUB_TOKEN=.. ...rest ```

Downloading stuck for some models

Ok I will close this since the issue seems solved, we can keep adding comments if other potential solutions are found.

Falcon-40b-instruct deployment in SageMaker fails when using serial inference pipeline

> The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address) This seems like the issue. Could it be that something is already running...

Option for use_fast tokenizer

> Please note that it is advised to avoid using the Hugging Face fast tokenizer for now, as we’ve observed that the auto-converted fast tokenizer sometimes gives incorrect tokenizations. Very...

Option for use_fast tokenizer

> Is there any downside in giving the end-user the choice of choosing a slow tokenizer? Yes the router cannot detect the number of tokens within queries, which disables a...

> @Narsil what exactly has been tested? [@Narsil what exactly has been tested?](https://github.com/huggingface/transformers/blob/main/tests/models/llama/test_tokenization_llama.py#L461-L492) I ran with other datasets too, XNLI is the hardest for tokenization because of UTF-8 issues. There...

GPTQ Formats that work (and don't)

> is there any chance you could add support for the quantize_config.json file? This is actually much cleaner than the ENV variables I added. I'm more than happy to switch...

GPTQ Formats that work (and don't)

> I would really love to stop using that old GPTQ-for-LLaMa code and will do as soon as I've confirmed there's no need to do so any more. You mean...