[BUG]: 该如何安装colossal到NPU上,看项目有相关描述,但没找到相关教程
Is there an existing issue for this bug?
- [x] I have searched the existing issues
The bug has not been fixed in the latest main branch
- [x] I have checked the latest main branch
Do you feel comfortable sharing a concise (minimal) script that reproduces the error? :)
Yes, I will share a minimal reproducible script.
🐛 Describe the bug
不知道该怎么安装colossal到NPU上,希望能有一个对应的教程,如何使用extentions部分来为npu安装colossal
Environment
No response
Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑🤝🧑👫🧑🏿🤝🧑🏻👩🏾🤝👨🏿👬🏿
Title: [BUG]: How to install colossal on NPU, see the project has a relevant description, but no relevant tutorial was found
我们提供了昇腾的Torch基础镜像:docker pull hpcaitech/pytorch-npu:2.4.0
在此基础上直接安装colossalai即可:安装最新稳定版pip install colossalai 或者安装main分支pip install git+https://github.com/hpcaitech/ColossalAI.git
@ver217 Hi ~ I try to install coati in npu docker environment, bug get get error for "NCCL" not ready.
How should we guide the install logic to recognize Ascend "HCCL"。
/dpc/wangzy/deepseek/ColossalAI/applications/ColossalChat# pip install .
Processing /dpc/wangzy/deepseek/ColossalAI/applications/ColossalChat
Preparing metadata (setup.py) ... done
Requirement already satisfied: transformers==4.39.3 in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (4.39.3)
Requirement already satisfied: tqdm in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (4.67.0)
Collecting datasets==2.14.7 (from coati==1.0.0)
Downloading datasets-2.14.7-py3-none-any.whl.metadata (19 kB)
Collecting loralib (from coati==1.0.0)
Downloading loralib-0.1.2-py3-none-any.whl.metadata (15 kB)
Requirement already satisfied: colossalai>=0.4.7 in /dpc/wangzy/deepseek/ColossalAI (from coati==1.0.0) (0.4.7)
Requirement already satisfied: torch>=2.1.0 in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (2.4.1)
Collecting langchain (from coati==1.0.0)
Downloading langchain-0.3.19-py3-none-any.whl.metadata (7.9 kB)
Requirement already satisfied: tokenizers in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (0.15.2)
Requirement already satisfied: fastapi in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (0.115.8)
Collecting sse_starlette (from coati==1.0.0)
Downloading sse_starlette-2.2.1-py3-none-any.whl.metadata (7.8 kB)
Collecting wandb (from coati==1.0.0)
Downloading wandb-0.19.6-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (10 kB)
Requirement already satisfied: sentencepiece in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (0.2.0)
Collecting gpustat (from coati==1.0.0)
Downloading gpustat-1.1.1.tar.gz (98 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: packaging in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (24.1)
Collecting autoflake==2.2.1 (from coati==1.0.0)
Downloading autoflake-2.2.1-py3-none-any.whl.metadata (7.3 kB)
Collecting black==23.9.1 (from coati==1.0.0)
Downloading black-23.9.1-py3-none-any.whl.metadata (65 kB)
Requirement already satisfied: tensorboard in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (2.18.0)
Requirement already satisfied: six==1.16.0 in /root/miniconda3/envs/glm-32b/lib/python3.10/site-packages (from coati==1.0.0) (1.16.0)
Collecting ninja==1.11.1 (from coati==1.0.0)
Downloading ninja-1.11.1-py2.py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (5.3 kB)
Collecting sentencepiece (from coati==1.0.0)
Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (7.7 kB)
Collecting flash-attn (from coati==1.0.0)
Downloading flash_attn-2.7.4.post1.tar.gz (6.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 266.9 kB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [19 lines of output]
/tmp/pip-install-smmj8s2o/flash-attn_6d6ea029aca840a68bc86afc3a228298/setup.py:106: UserWarning: flash_attn was requested, but nvcc was not found. Are you sure your environment has nvcc available? If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
warnings.warn(
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-smmj8s2o/flash-attn_6d6ea029aca840a68bc86afc3a228298/setup.py", line 198, in <module>
CUDAExtension(
File "/root/miniconda3/envs/glm-32b/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1076, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "/root/miniconda3/envs/glm-32b/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1207, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "/root/miniconda3/envs/glm-32b/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2416, in _join_cuda_home
raise OSError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
torch.__version__ = 2.4.1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
And we want to run the lora fine-tuning with deepseek 671b model and 4 nodes / 8 910b3 per node.
colossalai run --host 10.2.0.91,10.2.0.92 --nproc_per_node 8 \
lora_finetune.py --pretrained /dpc/zhanghaobo/deepseek-r1/DeepSeek-R1-BF16-LOCAL \
--dataset /dpc/wangzy/deepseek/ColossalAI/lora_sft_data.jsonl --plugin moe \
--lr 2e-5 --max_length 256 --g --ep 8 --pp 3 \
--batch_size 24 --lora_rank 8 --lora_alpha 16 \
--num_epochs 2 --warmup_steps 8 \
--tensorboard_dir logs --save_dir /dpc/wangzy/deepseek/DeepSeek-R1-bf16-lora
flash_attn is not available on NPU devices. DON'T install flash_attn and make a dummy directory in your python packages path. E.g.
mkdir .conda/envs/myenv/lib/python3.10/site-packages/flash_attn
touch .conda/envs/myenv/lib/python3.10/site-packages/flash_attn/__init__.py