ColossalAI
ColossalAI copied to clipboard
[BUG]: program stucks when using rpc_run and ColoInitContext together.
🐛 Describe the bug
I try to use rpc_run and gemini TP together. Then a deadlock happens.
A minimal code to reproduce:
from colossalai.tensor import ColoParameter
from colossalai.initialize import launch
import torch
import torch.multiprocessing as mp
def main(rank):
launch(dict(), rank, 2, 'localhost', 29999, 'nccl', verbose=False)
if rank == 0:
p = ColoParameter(torch.empty(10))
if __name__ == '__main__':
mp.spawn(main, nprocs=2)
Environment
No response
I'm working on fixing the bug. If someone has insights about this bug, plz contact me.
This issue was closed due to this is a wrong usage. Thanks.