[BUG]: colossalai check -i error
🐛 Describe the bug
I installed colossalai0.2.5 successfully(from source not pypi), but the following problem occurred when I executed command "colossalai check -i", please help me.
(Colossal-AI) ln01@ln01-System-Product-Name:/media/ln01/2t/usr/wy$ colossalai check -i
Traceback (most recent call last):
File "/home/ln01/anaconda3/envs/Colossal-AI/bin/colossalai", line 5, in
Environment
Python3.7+CUDA11.7+torch1.13.1
Hi, we have encountered several user feedback on this, one related issue is #2811
I am looking into this issue, however, I cannot reproduce this bug on my machine. Is it possible for you to provide a dockerfile script to reproduce this in Docker? I would be more than happy to help if needed.
I am looking into this issue, however, I cannot reproduce this bug on my machine. Is it possible for you to provide a dockerfile script to reproduce this in Docker? I would be more than happy to help if needed.
Sorry, I'm not familiar with the usage of Docker.
Hi, we have encountered several user feedback on this, one related issue is #2811
Thanks, I'll refer to this.
That's absolutely alright. If you could provide the output of the following bash commands, perhaps it can help me better locate the bug.
ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/colossalai/kernel
ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/
------------------ 原始邮件 ------------------ 发件人: "Frank @.>; 发送时间: 2023年2月21日(星期二) 下午5:15 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [hpcaitech/ColossalAI] [BUG]: colossalai check -i error (Issue #2845)
That's absolutely alright. If you could provide the output of the following bash commands, perhaps it can help me better locate the bug. ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/colossalai/kernel ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
@Alternate-D can I know whether you are running on windows?
@Alternate-D can I know whether you are running on windows?
Linux, Ubuntu22.04
@FrankLeeeee
That's absolutely alright. If you could provide the output of the following bash commands, perhaps it can help me better locate the bug.
ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/colossalai/kernel ls /home/ln01/anaconda3/envs/Colossal-AI/lib/python3.7/site-packages/
I have solve this promblem by adding a link
site-packages/colossalai/kernel$ ln -s ../../op_builder op_builder
then the check command works
$ colossalai check -i
#### Installation Report ####
------------ Environment ------------
Colossal-AI version: 0.2.5
PyTorch version: 1.12.0
System CUDA version: 10.2
CUDA version required by PyTorch: 10.2
seems like the reason for this error, but the origin reason is in the installing process. My env is
$ conda --version
conda 4.9.2
$ python -V
Python 3.9.12
project version is
$ git log
commit cd2b0eaa8dd4a7d8a67ce91b93459e07418bd741 (origin/main, origin/HEAD)
Author: YuliangLiu0306 <[email protected]>
Date: Tue Mar 7 11:08:11 2023 +0800
[DTensor] refactor sharding spec (#2987)
* [autoparallel] refactor sharding spec
* rename function name
Ok great, the root error is that somehow the symlink is not working. I am not able to reproduce this bug on our own machine. However, I am testing this on different OS using Docker. Possibly the usage of symlink is not what we desire and we will explore other implementations for this.
Glad to hear it was resolved. Thanks.