ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Unable to train on H20 machine

Open kaixinbear opened this issue 1 year ago • 1 comments

Is there an existing issue for this bug?

  • [X] I have searched the existing issues

🐛 Describe the bug

I want to use nvidia H20 machine to run the experiment based on ColossalAI. However, I find it hard to get into the forward function of the neural module. The log reports as below: image

Could anyone do me a favor ?

Environment

torch2.3 cu121 and I build the colossalai using the command: BUILD_EXT=1 pip install colossalai==0.3.7 --no-cache-dir

kaixinbear avatar Oct 06 '24 05:10 kaixinbear

Could you please offer more information? like other error message, your command and script.

wangbluo avatar Oct 07 '24 03:10 wangbluo