Yuge Zhang
Yuge Zhang
What's your environment, and in what step does it go wrong? (training/validation/first step/after a few steps) Do you have multiple GPUs and multiple nodes?
> torch.AcceleratorError: CUDA error: an illegal memory access was encountered This looks like a GPU OOM error to me. > In the near future I plan to try to implement...
> verl canceled chat_completion design in latest version which is very inconvenient Surprised to know. Seems that we need to figure out a plan. Either using verl in a different...
Please merge from main as there are CI updates.
Close as no follow-ups.
@xiaochulaoban please review the changes to make sure I didn't modify anything by mistake.
> Can it be merged into the main repository now I'll take that as a "no problem". Please open another issue/PR if you have further questions. Thanks.
I think some effort is needed to support latest verl as they moved the vllm inference server to agent loop. Will look into it.