Mert Toslali
Mert Toslali
@fingertap, I only get a warning when I run your mini repro w/ `torchrun --nproc-per-node=2 inf.py` in both Python3.10 and Python3.12 Warning: ``` [rank0]:[W428 20:32:01.845828927 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was...
I can confirm that Iām able to reproduce the bug using @fabianlim 's script. I also tested it across multiple vLLM versions ā 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, and 0.8.5...
I have tried to manually clean up atexit. ```python # Ensure proper teardown def cleanup(): print("Goodbye š Cleaning up...") try: if dist.is_initialized(): dist.destroy_process_group() gc.collect() torch.cuda.empty_cache() print("---cleaned") except Exception as e:...
Hey @youkaichao , just wanted to follow up on this issue to see if you have any recommendations or pointers?
> > one possible workaround is to reimplement the sleep mode with dispatch mode > > Sounds like a lot of work. Can we skip the garbage collection for the...