Fyeward comments

Results 4 comments of


                                            Fyeward

GRPO with reward model. CUDA out of memory. How to fix? Thank you very much.

> Try set smaller gradient_accumulation_steps in default_config.yaml. In GRPOTrainer cases, the memory of GPU will get significant larger with a larger gradient_accumulation_steps. The low use ratio of GPU is also...

GRPO with reward model. CUDA out of memory. How to fix? Thank you very much.

> > > Try set smaller gradient_accumulation_steps in default_config.yaml. In GRPOTrainer cases, the memory of GPU will get significant larger with a larger gradient_accumulation_steps. The low use ratio of GPU...

GRPO with reward model. CUDA out of memory. How to fix? Thank you very much.

> > > > > Try set smaller gradient_accumulation_steps in default_config.yaml. In GRPOTrainer cases, the memory of GPU will get significant larger with a larger gradient_accumulation_steps. The low use ratio...

Usage of gradient_accumulation_steps in GRPO

> For the training yes, but not for the generation. The generation is done once once the full effective batch. Thank you. I noticed that, according to the improvement in...