OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

Is save checkpoint not yet supported for ppo ray trainer?

Open mickel-liu opened this issue 3 months ago • 5 comments

When I set save_step other than -1, the program outputs an exception

self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem
AttributeError: 'Namespace' object has no attribute 'ckpt_path'

https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385

These three args are indeed not included in train_ppo_ray.py and I don't see arg.save_path being used.

I did see this issue was mentioned in #133, wondering if there's any update.

mickel-liu avatar Mar 27 '24 06:03 mickel-liu

Yes, we haven't fully developed and tested this feature yet. Welcome contribution

hijkzzz avatar Mar 27 '24 11:03 hijkzzz

i'm happy to look into it, but how have you guys been saving models?

mickel-liu avatar Mar 27 '24 20:03 mickel-liu

Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py for PPO instead of train_ppo.py, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.

suehyunpark avatar May 11 '24 10:05 suehyunpark

Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py for PPO instead of train_ppo.py, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.

Hi, I did look into the code and found out the saving checkpoints feature is not yet implemented. But actually saving checkpoints wasn't what I was looking for, I want the actual model checkpoints, not the intermediate states as being referred in this repo. So I ended up changing the code on my fork and now it saves model checkpoints after a pre-set amount of iterations. Here's the code in my fork: https://github.com/mickelliu/OpenRLHF/blob/a7f21aa26ac027fcf30ca1c588e01cf07c67cb6f/openrlhf/trainer/ppo_trainer.py#L428-L442

Regardless of ckpt feature is being officially implemented, train_ppo_ray.py will save a model checkpoint at the end of the training.

mickelliu avatar May 12 '24 07:05 mickelliu