RagingPandas

Results 2 comments of RagingPandas

> The spike is caused by cross entropy and entropy computation in backward. Thank you for the reply. Do you have any suggestions on what parameters I have set that...

I see, thanks for your reply. I halved the `ppo_max_token_len_per_gpu` (to 17k) and the spikes reduced down to 91% of GPU memory. Given vllm rollout memory isn't a constraint I'm...