lzy37ld comments

Results 21 comments of


                                            lzy37ld

A BUG? when performing gradient accumulation

Hi all, any updates here?

accelerate with wandb tracker not logging

@muellerzr can you kindly elaborate on how to do that?

support parallel reward function

@Jingru ,Hi Jingru, any updates here? I am also looking for this approach. Thanks!

support parallel reward function

Actually, I am bit confused about, why they have to separate the reward model to the last GPU? The reward model could not do the parallel like the policy model?

support parallel reward function

Thanks! Actually I am new to distributed training.. So I think my question is sort of like why they don't concat the reward model into all_model = All_Model(actor_model, critic_model, reward_model)...

support parallel reward function

For example, I stop my debugger at this point: https://github.com/CarperAI/trlx/blob/91a0f434d1e9a1536c5fd43ff5527d597d235f4f/trlx/trainer/accelerate_base_trainer.py#L280 But find that self.model(the whole model instead of the partioned one) would have a different devices for different processes. That's...