OpenRLHF
OpenRLHF copied to clipboard
Support checkpoint to prevent training from collapse
@hijkzzz
add basic ckpt function: https://github.com/OpenLLMAI/OpenRLHF/commit/f53571de43a4524644e75c9c472bbc69ac7b72c2
Hi team what are the next steps here ?, we can support this effort as we need this critically
from my understanding we need to support checkpointing and loading the actor and critic model
@karthik-nexusflow We need to support the following features:
- save and load actor and critic model weights optimizers schedulers, which can be done with DeepSpeed API
- save and load the the progress of data loader, we may need to rewrite a new distributed sampler.
- ssave and load seed
The second point can be tricky
great for 2 , if we save all the dataset indices that were seen and skip that after we resume , could be a naive initial approach
also we can initial support only for using 1 prompt dataset and not multiple datasets maybe
great for 2 , if we save all the dataset indices that were seen and skip that after we resume , could be a naive initial approach
also we can initial support only for using 1 prompt dataset and not multiple datasets maybe
Because the dataset will be shuffled before the training, we can skip the trained samples in the for xxx in dataloader
and also save/load the epoch.