ColossalAI
ColossalAI copied to clipboard
[BUG]: Unable to train on H20 machine
Is there an existing issue for this bug?
- [X] I have searched the existing issues
🐛 Describe the bug
I want to use nvidia H20 machine to run the experiment based on ColossalAI. However, I find it hard to get into the forward function of the neural module. The log reports as below:
Could anyone do me a favor ?
Environment
torch2.3 cu121 and I build the colossalai using the command: BUILD_EXT=1 pip install colossalai==0.3.7 --no-cache-dir
Could you please offer more information? like other error message, your command and script.