Nicolas Patry

Results 978 comments of Nicolas Patry

Hi @amtam0 it runs on the API, explaining the first load time. The code to actually run the inference is available and you can try doing: ``` cd huggingface_hub/api-inference-community ./manage.py...

Yes, llama3 has 2 eos tokens. eot_id for turn token, and. "real" eos_token (not sure when used). Currently the config defines `` as the eos token, which if what you're...

Yes it is. And hf-chat sends that stop token currently.

Slow ? What do you mean ? What hardware, TP ? What is slow in this case?

The frequency penalty is being solved soon : https://github.com/huggingface/text-generation-inference/pull/1765. For the stop token, yes it's unfortunate setup, we're solving changing the default in many places (basically there are 2 stop...

> We are expecting to get the updated docker image by Monday next week. Do you think a TGI release on next week Tuesday/Wednesday with this PR in is feasible?...

Hi @anttttti It's really hard to create a `handler.py` within TGI itself. The reason is that the current code is a tight loop highly catered for performance. And as a...

Can you show the command you're using ? Also show all the logs here please, we can't help without actual information and reproducibility.

Hi @LeoDog896 , Sorry for the long wait on this. I do not understand the need for it. Why do you need to JSON representation of `TensorView` ? I'm hesitant...