Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Inference widgets - local app

Hi @amtam0 it runs on the API, explaining the first load time. The code to actually run the inference is available and you can try doing: ``` cd huggingface_hub/api-inference-community ./manage.py...

nn_pruning doesn't seem to work for T5 Models, Roberta-based Models

sorry I don't know, I can't really help.

Llama-3 support

Yes, llama3 has 2 eos tokens. eot_id for turn token, and. "real" eos_token (not sure when used). Currently the config defines `` as the eos token, which if what you're...

Llama-3 support

Yes it is. And hf-chat sends that stop token currently.

Llama-3 support

Slow ? What do you mean ? What hardware, TP ? What is slow in this case?

The frequency penalty is being solved soon : https://github.com/huggingface/text-generation-inference/pull/1765. For the stop token, yes it's unfortunate setup, we're solving changing the default in many places (basically there are 2 stop...

MI300 compatibility

> We are expecting to get the updated docker image by Monday next week. Do you think a TGI release on next week Tuesday/Wednesday with this PR in is feasible?...

[Feature] Support for Hugging Face Inference Endpoints custom handlers?

Hi @anttttti It's really hard to create a `handler.py` within TGI itself. The reason is that the current code is a tight loop highly catered for performance. And as a...

TGI with c4ai GPTQ models

Can you show the command you're using ? Also show all the logs here please, we can't help without actual information and reproducibility.

Allow serialization of TensorInfo, and other safetensors classes

Hi @LeoDog896 , Sorry for the long wait on this. I do not understand the need for it. Why do you need to JSON representation of `TensorView` ? I'm hesitant...