alpaca-lora Error when resuming from checkpoint, unable to load model.

Error when resuming from checkpoint, unable to load model.

Open zolastro opened this issue 1 year ago • 3 comments

I was trying to fine-tune the model using two distinct prompting methods. For this, I first trained a model using one corpus. Then, using the automatically saved checkpoints, I tried to fine-tune the model using another corpus using the following command:

python finetune.py --base_model 'decapoda-research/llama-7b-hf' --resume_from_checkpoint './my-model/checkpoint-8400/'

However, when I try to resume from the checkpoint, I get the following error:

Restarting from ./my-model/checkpoint-8400/pytorch_model.bin
Traceback (most recent call last):
  File "finetune.py", line 277, in <module>
    fire.Fire(train)
  File "/data/tmp/astotxo/env/lib/python3.8/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/data/tmp/astotxo/env/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/data/tmp/astotxo/env/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "finetune.py", line 205, in train
    model.print_trainable_parameters()  # Be more transparent about the % of trainable params.
AttributeError: 'NoneType' object has no attribute 'print_trainable_parameters'

Am I doing something wrong? Is there a way to resume from a checkpoint and fine-tune the model?

Apr 17 '23 07:04 zolastro

Make sure you use the latest code. This could be because you have an older code, not compatible with current peft.

Apr 17 '23 09:04 AngainorDev

met the same issue...not be solved yet.

Apr 17 '23 09:04 nkjulia

Make sure you use the latest code. This could be because you have an older code, not compatible with current peft.

I just made a pull and it seem to just solved the problem! Cheers!

I'll keep this issue open for @nkjulia seems to be facing a different problem, but otherwise fell free to close it.

Apr 17 '23 10:04 zolastro

alpaca-lora alpaca-lora copied to clipboard

Error when resuming from checkpoint, unable to load model.

alpaca-lora
alpaca-lora copied to clipboard