OpenRLHF
OpenRLHF copied to clipboard
Is save checkpoint not yet supported for ppo ray trainer?
When I set save_step
other than -1, the program outputs an exception
self.actor.model, os.path.join(args.ckpt_path, "_actor"), tag, args.max_ckpt_num, args.max_ckpt_mem
AttributeError: 'Namespace' object has no attribute 'ckpt_path'
https://github.com/OpenLLMAI/OpenRLHF/blob/3c918755faa31ee810f3624a82ba5f7879e4f8d3/openrlhf/trainer/ppo_trainer.py#L378-L385
These three args are indeed not included in train_ppo_ray.py
and I don't see arg.save_path
being used.
I did see this issue was mentioned in #133, wondering if there's any update.
Yes, we haven't fully developed and tested this feature yet. Welcome contribution
i'm happy to look into it, but how have you guys been saving models?
Hi @mickel-liu, have you figured this out? I have no choice but to use train_ppo_ray.py
for PPO instead of train_ppo.py
, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.
Hi @mickel-liu, have you figured this out? I have no choice but to use
train_ppo_ray.py
for PPO instead oftrain_ppo.py
, because it doesn't OOM during model loading in my configuration. I am looking into ways to save checkpoints during/after training, and was hoping if you have delved into this feature as well.
Hi, I did look into the code and found out the saving checkpoints feature is not yet implemented. But actually saving checkpoints wasn't what I was looking for, I want the actual model checkpoints, not the intermediate states as being referred in this repo. So I ended up changing the code on my fork and now it saves model checkpoints after a pre-set amount of iterations. Here's the code in my fork: https://github.com/mickelliu/OpenRLHF/blob/a7f21aa26ac027fcf30ca1c588e01cf07c67cb6f/openrlhf/trainer/ppo_trainer.py#L428-L442
Regardless of ckpt feature is being officially implemented, train_ppo_ray.py
will save a model checkpoint at the end of the training.