Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Adding support for `safetensors` and LoRa.

I don't think the failing test is linked to this PR, is it ?

Adding support for `safetensors` and LoRa.

Shall I merge ?

Adding support for `safetensors` and LoRa.

Done.

Adding support for `safetensors` and LoRa.

Ooops ! Thanks for notifying. I created a fix here: https://github.com/huggingface/diffusers/pull/2551

Adding support for `safetensors` and LoRa.

@Ir1d can you provide a reproducible workflow (ideally fast to execute) ?

Adding support for `safetensors` and LoRa.

Do you have links to the `model_path` you're referring to ? Here is a modified version of your scripts that creates the proper LoRA safetensors file: ```python from diffusers import...

Adding support for `safetensors` and LoRa.

> So our current workflow is use convert_lora_safetensor_to_diffusers.py to merge a lora to its base model, then if we want to separate it and use it like a native lora...

Adding support for `safetensors` and LoRa.

> but that raised an KeyError: 'to_k_lora.down.weight'. This means the LoRA is still in SD format, and you need to change it to `diffusers` format I guess. @pcuenca Might know...

Question: How to estimate memory requirements for a certain batch size/

Unfortunately not at the moment. https://github.com/huggingface/text-generation-inference/issues/478 might help memory. Other than that `--max-total-batch-tokens` is really the variable you need to set to control the amount of memory your going to...

How to make sure the local tgi server's performance is ok

> has come a long way from other inference servers What do you mean ? Is it faster or slower ? I'm guessing slower but the phrasing isn't clear to...