runrunliuliu

Results 1 issues of runrunliuliu

After training 1000+ steps in GRPO,the following error happens. I have found similar issue in [https://github.com/modelscope/ms-swift/issues/3864], it seems there are bugs in cumem.py of vllm? (LLMRayActor pid=3823828, ip=[2605:340:cd51:4900:c06a:4260:c8a8:4be7]) File "/usr/local/lib/python3.11/dist-packages/vllm/device_allocator/cumem.py",...