OpenRLHF icon indicating copy to clipboard operation
OpenRLHF copied to clipboard

Support checkpoint to prevent training from collapse

Open hijkzzz opened this issue 10 months ago • 7 comments

hijkzzz avatar Aug 18 '23 08:08 hijkzzz

@hijkzzz

hijkzzz avatar Aug 21 '23 00:08 hijkzzz

add basic ckpt function: https://github.com/OpenLLMAI/OpenRLHF/commit/f53571de43a4524644e75c9c472bbc69ac7b72c2

catqaq avatar Oct 28 '23 16:10 catqaq

Hi team what are the next steps here ?, we can support this effort as we need this critically

karthik-nexusflow avatar Feb 19 '24 23:02 karthik-nexusflow

from my understanding we need to support checkpointing and loading the actor and critic model

karthik-nexusflow avatar Feb 19 '24 23:02 karthik-nexusflow

@karthik-nexusflow We need to support the following features:

  1. save and load actor and critic model weights optimizers schedulers, which can be done with DeepSpeed API
  2. save and load the the progress of data loader, we may need to rewrite a new distributed sampler.
  3. ssave and load seed

The second point can be tricky

hijkzzz avatar Feb 20 '24 00:02 hijkzzz

great for 2 , if we save all the dataset indices that were seen and skip that after we resume , could be a naive initial approach

also we can initial support only for using 1 prompt dataset and not multiple datasets maybe

karthik19967829 avatar Feb 20 '24 00:02 karthik19967829

great for 2 , if we save all the dataset indices that were seen and skip that after we resume , could be a naive initial approach

also we can initial support only for using 1 prompt dataset and not multiple datasets maybe

Because the dataset will be shuffled before the training, we can skip the trained samples in the for xxx in dataloader and also save/load the epoch.

hijkzzz avatar Feb 20 '24 02:02 hijkzzz