ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]:

Open YuchengWang opened this issue 2 years ago β€’ 6 comments

πŸ› Describe the bug

Install ColossalAI with docker,

cd ColossalAI docker build -t colossalai ./docker

error:

Successfully installed apex-0.1 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Removing intermediate container c93930dca032 ---> 7f233c85eaa7 Step 6/8 : RUN git clone https://github.com/hpcaitech/ColossalAI.git && cd ./ColossalAI && CUDA_EXT=1 pip install -v --no-cache-dir . ---> Running in ce069f1f38ef Cloning into 'ColossalAI'... Using pip 21.2.4 from /opt/conda/lib/python3.9/site-packages/pip (python 3.9) Processing /workspace/ColossalAI DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default. pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555. Installing build dependencies: started Running command /opt/conda/bin/python /tmp/pip-standalone-pip-ha6cq51m/env_pip.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-ppkw8hq3/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'setuptools>=40.8.0' wheel Collecting setuptools>=40.8.0 Downloading setuptools-67.5.1-py3-none-any.whl (1.1 MB) Collecting wheel Downloading wheel-0.38.4-py3-none-any.whl (36 kB) Installing collected packages: wheel, setuptools Successfully installed setuptools-67.5.1 wheel-0.38.4 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Running command /opt/conda/bin/python /opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmp1m9hq8hs Traceback (most recent call last): File "/opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 349, in main() File "/opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 331, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 117, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-ppkw8hq3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "/tmp/pip-build-env-ppkw8hq3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-ppkw8hq3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 484, in run_setup super(_BuildMetaLegacyBackend, File "/tmp/pip-build-env-ppkw8hq3/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 335, in run_setup exec(code, locals()) File "", line 121, in File "", line 38, in environment_check_for_cuda_extension_build ModuleNotFoundError: [extension] PyTorch is not found while CUDA_EXT=1. You need to install PyTorch first in order to build CUDA extensions Getting requirements to build wheel: finished with status 'error' WARNING: Discarding file:///workspace/ColossalAI. Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmp1m9hq8hs Check the logs for full command output. ERROR: Command errored out with exit status 1: /opt/conda/bin/python /opt/conda/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py get_requires_for_build_wheel /tmp/tmp1m9hq8hs Check the logs for full command output. The command '/bin/sh -c git clone https://github.com/hpcaitech/ColossalAI.git && cd ./ColossalAI && CUDA_EXT=1 pip install -v --no-cache-dir .' returned a non-zero code: 1

Environment

ycwang@ycwang-ThinkPad-P50:~/colossalai/ColossalAI$ nvidia-smi Mon Mar 6 22:02:19 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.78.01 Driver Version: 525.78.01 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro M2000M Off | 00000000:01:00.0 Off | N/A | | N/A 35C P8 N/A / N/A | 6MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1322 G /usr/lib/xorg/Xorg 2MiB | +-----------------------------------------------------------------------------+

YuchengWang avatar Mar 06 '23 14:03 YuchengWang

Bot detected the issue body's language is not English, translate it automatically. πŸ‘―πŸ‘­πŸ»πŸ§‘β€πŸ€β€πŸ§‘πŸ‘«πŸ§‘πŸΏβ€πŸ€β€πŸ§‘πŸ»πŸ‘©πŸΎβ€πŸ€β€πŸ‘¨πŸΏπŸ‘¬πŸΏ


Title: [BUG]:

Issues-translate-bot avatar Mar 06 '23 14:03 Issues-translate-bot

Hi @YuchengWang , could you please try to directly pull our image from docker hub by docker pull hpcaitech/colossalai?

kurisusnowdeng avatar Mar 08 '23 06:03 kurisusnowdeng

Hi @kurisusnowdeng , What's the correct command to pull image from docker hub?

ycwang@ycwang-ThinkPad-P50:~$ docker image pull hpcaitech/colossalai Using default tag: latest Error response from daemon: manifest for hpcaitech/colossalai:latest not found: manifest unknown: manifest unknown ycwang@ycwang-ThinkPad-P50:~$ docker pull hpcaitech/colossalai Using default tag: latest Error response from daemon: manifest for hpcaitech/colossalai:latest not found: manifest unknown: manifest unknown

YuchengWang avatar Mar 08 '23 11:03 YuchengWang

i met the some case.

colynhn avatar Mar 14 '23 07:03 colynhn

@YuchengWang @colynhn Maybe try a specific tag would help (e.g. docker pull hpcaitech/colossalai:0.2.7).

kurisusnowdeng avatar Mar 15 '23 06:03 kurisusnowdeng

Thanks, Finished pull docker.

YuchengWang avatar Mar 16 '23 00:03 YuchengWang

Glad to hear it was resolved. Thanks.

binmakeswell avatar Apr 27 '23 10:04 binmakeswell