alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

Maximum recursion depth exceeded

Open ouwei2013 opened this issue 1 year ago • 4 comments

I tried to fine tune the 13B model with a 3090 (24GB Ram). The training was started and a progress bar was also shown, however, I got an error saying 'maximum recursion depth exceeded' after 100 steps of training. Has anyone had the similar error? Thanks!

ouwei2013 avatar Mar 17 '23 04:03 ouwei2013

Here is the stack trace:

~/~/llm/transformers/src/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1631 self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size 1632 ) -> 1633 return inner_training_loop( 1634 args=args, 1635 resume_from_checkpoint=resume_from_checkpoint,

~/~/llm/transformers/src/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 1977 self.control = self.callback_handler.on_step_end(args, self.state, self.control) 1978 -> 1979 self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval) 1980 else: 1981 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)

~/~/llm/transformers/src/transformers/trainer.py in maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval) 2238 ... ---> 29 lambda self, *, **__: get_peft_model_state_dict(self, old_state_dict()) 30 ).get(model, type(model)) 31

RecursionError: maximum recursion depth exceeded

ouwei2013 avatar Mar 17 '23 05:03 ouwei2013

You could fix this by commenting out these lines:

old_state_dict = model.state_dict
model.state_dict = (
    lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
).__get__(model, type(model))

and setting load_best_model_at_end to False.

By the way, what version of peft are you using?

tloen avatar Mar 17 '23 05:03 tloen

Thank you very much ! I will try later. I installed peft from its source on github(main branch).

ouwei2013 avatar Mar 17 '23 05:03 ouwei2013

You could fix this by commenting out these lines:


old_state_dict = model.state_dict

model.state_dict = (

    lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())

).__get__(model, type(model))

and setting load_best_model_at_end to False.

By the way, what version of peft are you using?

It works !!!! Thank you very much again !

ouwei2013 avatar Mar 17 '23 12:03 ouwei2013