ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: stuck at initialization and no error message

Open binmakeswell opened this issue 2 years ago • 4 comments

🐛 Describe the bug

When parallel is set to pipeline=4 and tensor=dict(mode='2d', size=4), the program will get stuck on initialization and no error message will be output.

Environment

2*8 A100

binmakeswell avatar Jun 29 '22 04:06 binmakeswell

Same bug at tp2pp4

binmakeswell avatar Jun 29 '22 07:06 binmakeswell

same bug, looking forward a solution...The different thing is that I didn't use paraller config

LinglingGreat avatar Aug 10 '22 09:08 LinglingGreat

which example is related to this bug?

FrankLeeeee avatar Aug 16 '22 06:08 FrankLeeeee

which example is related to this bug?

gpt2 for the initial issue.

binmakeswell avatar Aug 16 '22 06:08 binmakeswell

We have updated a lot. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 13 '23 04:04 binmakeswell