Nicolas Patry
Nicolas Patry
I don't think the failing test is linked to this PR, is it ?
Shall I merge ?
Ooops ! Thanks for notifying. I created a fix here: https://github.com/huggingface/diffusers/pull/2551
@Ir1d can you provide a reproducible workflow (ideally fast to execute) ?
Do you have links to the `model_path` you're referring to ? Here is a modified version of your scripts that creates the proper LoRA safetensors file: ```python from diffusers import...
> So our current workflow is use convert_lora_safetensor_to_diffusers.py to merge a lora to its base model, then if we want to separate it and use it like a native lora...
> but that raised an KeyError: 'to_k_lora.down.weight'. This means the LoRA is still in SD format, and you need to change it to `diffusers` format I guess. @pcuenca Might know...
Unfortunately not at the moment. https://github.com/huggingface/text-generation-inference/issues/478 might help memory. Other than that `--max-total-batch-tokens` is really the variable you need to set to control the amount of memory your going to...
> has come a long way from other inference servers What do you mean ? Is it faster or slower ? I'm guessing slower but the phrasing isn't clear to...