text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

curious about the plans for supporting PEFT and LoRa.

Open kissngg opened this issue 2 years ago • 9 comments

Feature request

I need to be able to apply lora adapter to local llm

Motivation

lora is a good tool to lightly go through and check your current llm tuning I think you need it in the local model api

Your contribution

...

kissngg avatar Jun 21 '23 02:06 kissngg

plz~

YooSungHyun avatar Jun 21 '23 12:06 YooSungHyun

Hey, do you know about https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.LoraModel.merge_and_unload

Basically, you could

model = model.merge_and_unload()
model.save_pretrained("mynewmergedmodel")

which will "write" the peft weights directly into the model, making it a regular transformer model, which text-generation-inference could support.

@younesbelkada in case I write something wrong.

It would be nice to add full blown support, but it means reintegrating of lot of peft logic into tgi, this seems like the easier route for the time being.

Narsil avatar Jun 22 '23 07:06 Narsil

I second what @Narsil said, you can do that to merge the LoRA weights, we should indeed add more documentation on PEFT

younesbelkada avatar Jun 22 '23 07:06 younesbelkada

Hey, do you know about https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.LoraModel.merge_and_unload

Basically, you could

model = model.merge_and_unload()
model.save_pretrained("mynewmergedmodel")

which will "write" the peft weights directly into the model, making it a regular transformer model, which text-generation-inference could support.

@younesbelkada in case I write something wrong.

It would be nice to add full blown support, but it means reintegrating of lot of peft logic into tgi, this seems like the easier route for the time being.

@Narsil

If, i want to use like this,

# model_name is request body data....
if model_name == "chat":
    with self.model.disable_adapter():
        model_output = self.model.generate(
            input_ids=input_ids.cuda(), generation_config=generation_config
        )[0]
else:
    self.model.set_adapter(model_name)
    model_output = self.model.generate(input_ids=input_ids.cuda(), generation_config=generation_config)[0]

it is impossible using that way... isn't it?

YooSungHyun avatar Jun 23 '23 04:06 YooSungHyun

I've wondered about this as well. Is there a way to have plug and play finetuned adapters on specific tasks?


if task == "chat":
    self.model.set_adapter("chat_model")
elif task == "data_extraction":
    self.model.set_adapter("extraction_model")
elif task == "classification":
    self.model.set_adapter("classification_model")
elif task == "FAQ":
    self.model.set_adapter("faq_model")
else:
    self.model.disable_adapter()

ravilkashyap avatar Jun 29 '23 06:06 ravilkashyap

@ravilkashyap you mean like that? yes. i'm using peft on triton python_backend like your way and my way too but you have to train first each lora layer and named like that

but i think, in text-generation-inference, can not using like that. because i can not receive task by request.

if you using triton python_backend, you can using switching adapter by name on gRPC streaming service and http service. i already test done.

YooSungHyun avatar Jul 05 '23 13:07 YooSungHyun

I have finetuned falcon-7B and merged my LORA with the pretrained model. But when I try to load it I have the following errors:

Torch: RuntimeError: weight transformer.word_embeddings.weight does not exist Safetensors: RuntimeError: weight lm_head.weight does not exist and indeed in config there is no lm_head filed.

Any clues on what should I do ?

PitchboyDev avatar Jul 05 '23 14:07 PitchboyDev

@PitchboyDev i think that is not in this issue agenda... i think that is simply huggingface's model problem. not peft. because your error msg is not raised in peft.

how about moving your question to huggingface forum or someting?

YooSungHyun avatar Jul 06 '23 02:07 YooSungHyun

It's neither peft nor transformers. It's an error from this repo's code cause I can load it and use it with a transformers Gradio app. I found an issue about it: #541

PitchboyDev avatar Jul 06 '23 13:07 PitchboyDev