Mert Toslali

Results 5 comments of Mert Toslali

@fingertap, I only get a warning when I run your mini repro w/ `torchrun --nproc-per-node=2 inf.py` in both Python3.10 and Python3.12 Warning: ``` [rank0]:[W428 20:32:01.845828927 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was...

I can confirm that I’m able to reproduce the bug using @fabianlim 's script. I also tested it across multiple vLLM versions — 0.8.0, 0.8.1, 0.8.2, 0.8.3, 0.8.4, and 0.8.5...

I have tried to manually clean up atexit. ```python # Ensure proper teardown def cleanup(): print("Goodbye šŸ‘‹ Cleaning up...") try: if dist.is_initialized(): dist.destroy_process_group() gc.collect() torch.cuda.empty_cache() print("---cleaned") except Exception as e:...

Hey @youkaichao , just wanted to follow up on this issue to see if you have any recommendations or pointers?

> > one possible workaround is to reimplement the sleep mode with dispatch mode > > Sounds like a lot of work. Can we skip the garbage collection for the...