alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

Useful script/idea request: convert training checkpoint to Lora

Open baleksey opened this issue 1 year ago • 1 comments

@tloen It would be great to have another small but useful script from you:

  • converting any trained checkpoint from finetuning (by giving it path) to lora weight (adapter_model and config)
  • Or/and make possible to save not only checkpoints every X steps but lora model as well. So we can test it anytime without additional conversions

And the question about finetuning: tried to restore training from last checkpoint by making trainer.train(resume_from_checkpoint=True) which finds the last save checkpoint and restart from it. But while it recovers steps properly it shows way different loss which VERY slowly goes down then. It looks like it starting from begging or I just don't get it. As an example - the last checkpoint loss was ~0.7917 after restored training it becomes 8.7 and after 60 steps it still shows 5.72 wow. See the log: trainer_state.txt

So If it possible and efficient to continue training at all? Or there is some problem with code which makes it impossible?

baleksey avatar Mar 19 '23 14:03 baleksey

I am seeing similar issue, though not exactly the same. I am seeing the loss will be slightly higher than before-resume, and this is really strange. If I rm the part

old_state_dict = model.state_dict
    model.state_dict = (
        lambda self, *_, **__: get_peft_model_state_dict(
            self, old_state_dict()
        )
    ).__get__(model, type(model))

The loss can be exactly the same, thought the ckp should be very large...

REIGN12 avatar Apr 14 '23 02:04 REIGN12