lzy37ld

Results 21 comments of lzy37ld

@muellerzr can you kindly elaborate on how to do that?

@Jingru ,Hi Jingru, any updates here? I am also looking for this approach. Thanks!

Actually, I am bit confused about, why they have to separate the reward model to the last GPU? The reward model could not do the parallel like the policy model?

Thanks! Actually I am new to distributed training.. So I think my question is sort of like why they don't concat the reward model into all_model = All_Model(actor_model, critic_model, reward_model)...

For example, I stop my debugger at this point: https://github.com/CarperAI/trlx/blob/91a0f434d1e9a1536c5fd43ff5527d597d235f4f/trlx/trainer/accelerate_base_trainer.py#L280 But find that self.model(the whole model instead of the partioned one) would have a different devices for different processes. That's...

Looks like reward in parallel is more efficient? As we don't need to gather or broadcast to because the reward model is on the last device anymore..

But how much VRAM does stage 3 need? Say I just have two relatively small GPUs, like 5GB for each. If our model is 6GB, and after partition, each GPU...

Thanks for this! @Jingru . I carefully checked it again and feel like this is already being shared as the way --zero3_init_flag would do, if you see the comments in...

Oh If I understand correctly, accelerate could not use **prepare** twice in a script, so that's why you use deepspeed.initialize