peft icon indicating copy to clipboard operation
peft copied to clipboard

How do I continue training with PEFT?

Open therealadityashankar opened this issue 1 year ago • 2 comments

I'd be greatful if I can be given an example as to how I can continue fine tuning an already trained model with PEFT, the examples I seem to be coming to (specifically for int8 training) seem to only showcase training for training it from scratch

can I just continue fine tuning the trained model with peft ?, do I have to call prepare_model_for_training again on the new peft model before I continue training it

therealadityashankar avatar Mar 17 '23 10:03 therealadityashankar

Hello, if it is for LoRA method using INT8, call the prepare_int8_model_for_training on the base model, then do the PeftModel.from_pretrained(base_model, peft_model_id). Now, before training, do model.train() and you are good to continue with the training. Please let us know in case you encounter issues.

pacman100 avatar Mar 17 '23 12:03 pacman100

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

github-actions[bot] avatar Apr 16 '23 15:04 github-actions[bot]

Hi! I have the similar issue when trying to further train the finetuned LoRa model. What I did is

model = LlamaForCausalLM.from_pretrained(
        base_model,
        load_in_8bit=True,
        torch_dtype=torch.float16,
        device_map=device_map,
    )
prepare_model_for_int8_training(model)

model = PeftModel.from_pretrained(
    model,
    lora_weights,
    torch_dtype=torch.float16,
    # is_trainable=True
)
model.train()

But when running model.print_trainable_parameters(), it says the trainable params is 0. If I enable the is_trainable=True, the trainable params becomes more than expected.

williamLyh avatar Apr 26 '23 23:04 williamLyh

Hello, if it is for LoRA method using INT8, call the prepare_int8_model_for_training on the base model, then do the PeftModel.from_pretrained(base_model, peft_model_id). Now, before training, do model.train() and you are good to continue with the training. Please let us know in case you encounter issues.

Hi, I wonder that if I continue training with peft, the learning rate will inherit the old one's, but if the old training has already done, the learning rate would be close to 0. How to solve this?

edwardelric1202 avatar May 31 '23 04:05 edwardelric1202

hi @williamLyh ,

for name, param in model.named_parameters():
    if 'lora' in name or 'Lora' in name:
        param.requires_grad = True

this will help you :)

sayhellotoAI2 avatar Jun 14 '23 01:06 sayhellotoAI2

As far as I know, the graceful solution is someway like this

from peft.tuners.lora import mark_only_lora_as_trainable
lora_model = PeftModel.from_pretrained(in_model, path, is_trainable=True)
mark_only_lora_as_trainable(lora_model)

shaunheilee avatar Aug 02 '23 03:08 shaunheilee

With either @shaunheilee and @sayhellotoAI2 proposed solutions, I get NaN loss when resuming a training.... Am I the only one ?

SimonBenhamou avatar Aug 14 '23 23:08 SimonBenhamou

@SimonBenhamou , I'm having the same problem as you. I've tried all of the above solutions, but I'm still not training model. did you find a solution?

wjddyd66 avatar Aug 29 '23 05:08 wjddyd66

When trying to resume training on LLama2 lora I was running into RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

This was solved with the following:

base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
base_model.enable_input_require_grads()
model = PeftModel.from_pretrained(base_model, peft_adapter_path, is_trainable=True)
model._mark_only_adapters_as_trainable()

The _mark_only_adapters_as_trainable() might not be necessary,

timohear avatar Sep 19 '23 18:09 timohear

The mark is not necesaary, if resume then you msut only want training lora only, it just will train lora only indded

MonolithFoundation avatar Jun 21 '24 04:06 MonolithFoundation

I tried above both solution and I am running into the following error: self.dtype = self.optimizer.param_groups[0]['params'][0].dtype

I am using deep speed with two GPUs. I did merge state for rank0 and rank1 but that did not solve issue on resuming training. However, I can evaluate Peft Model.

mohbattharani avatar Jun 25 '24 22:06 mohbattharani