Yuge Zhang

Results 279 comments of Yuge Zhang

What's your environment, and in what step does it go wrong? (training/validation/first step/after a few steps) Do you have multiple GPUs and multiple nodes?

> torch.AcceleratorError: CUDA error: an illegal memory access was encountered This looks like a GPU OOM error to me. > In the near future I plan to try to implement...

> verl canceled chat_completion design in latest version which is very inconvenient Surprised to know. Seems that we need to figure out a plan. Either using verl in a different...

Please merge from main as there are CI updates.

Close as no follow-ups.

@xiaochulaoban please review the changes to make sure I didn't modify anything by mistake.

> Can it be merged into the main repository now I'll take that as a "no problem". Please open another issue/PR if you have further questions. Thanks.

I think some effort is needed to support latest verl as they moved the vllm inference server to agent loop. Will look into it.