ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: program stucks when using rpc_run and ColoInitContext together.

Open Wesley-Jzy opened this issue 2 years ago • 1 comments

🐛 Describe the bug

I try to use rpc_run and gemini TP together. Then a deadlock happens.

A minimal code to reproduce:

from colossalai.tensor import ColoParameter
from colossalai.initialize import launch
import torch
import torch.multiprocessing as mp

def main(rank):
    launch(dict(), rank, 2, 'localhost', 29999, 'nccl', verbose=False)
    if rank == 0:
        p = ColoParameter(torch.empty(10))

if __name__ == '__main__':
    mp.spawn(main, nprocs=2)

Environment

No response

Wesley-Jzy avatar Feb 01 '23 03:02 Wesley-Jzy

I'm working on fixing the bug. If someone has insights about this bug, plz contact me.

Wesley-Jzy avatar Feb 01 '23 03:02 Wesley-Jzy

This issue was closed due to this is a wrong usage. Thanks.

binmakeswell avatar Apr 18 '23 08:04 binmakeswell