[BUG]: docker build cuda extension error
Is there an existing issue for this bug?
- [X] I have searched the existing issues
π Describe the bug
when docker build run follow command RUN BUILD_EXT=1 pip install colossalai-nightly
RuntimeError: [extension] Could not find any kernel compatible with the current environment. but if I run this command in a container (with gpu flag to use GPU cards) then It suceed base image FROM nvcr.io/nvidia/cuda:11.8.0-devel-ubuntu20.04
Environment
No response
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Title: [BUG]: docker build cuda extension error
RUN BUILD_EXT=1 pip install colossalai-nightly:
#0 7.666 Collecting colossalai-nightly
#0 12.99 Downloading colossalai-nightly-2024.5.18.tar.gz (1.2 MB)
#0 13.69 ββββββββββββββββββββββββββββββββββββββββ 1.2/1.2 MB 1.7 MB/s eta 0:00:00
#0 14.21 Preparing metadata (setup.py): started
#0 17.77 Preparing metadata (setup.py): finished with status 'error'
#0 17.78 error: subprocess-exited-with-error
#0 17.78
#0 17.78 Γ python setup.py egg_info did not run successfully.
#0 17.78 β exit code: 1
#0 17.78 β°β> [7 lines of output]
#0 17.78 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#0 17.78 Traceback (most recent call last):
#0 17.78 File "
one Created wheel for colossalai-nightly: filename=colossalai_nightly-2024.5.18-cp310-cp310-linux_x86_64.whl size=23673844 sha256=0a0bb55154c1ce9758ff8f9dd4b38e4b462647ad38e9714d9a4b2de6153b163e Stored in directory: /root/.cache/pip/wheels/ef/39/0e/39263ec364cb9d67240001279c9bcb1808b102252ea4ecaf33 Building wheel for contexttimer (setup.py) ... done Created wheel for contexttimer: filename=contexttimer-0.3.3-py3-none-any.whl size=5804 sha256=877270da42acb2811b2b5fbb097ce315895a4f6ed3b4da34aa5318a60c758006 Stored in directory: /root/.cache/pip/wheels/72/1c/da/cfd97201d88ccce214427fa84a5caeb91fef7c5a1b4c4312b4 Successfully built colossalai-nightly contexttimer Installing collected packages: ninja, distlib, contexttimer, wrapt, virtualenv, pydantic-core, nodeenv, msgpack, invoke, identify, cfgv, bcrypt, annotated-types, pynacl, pydantic, pre-commit, google, deprecated, cryptography, tokenizers, paramiko, transformers, ray, fabric, galore_torch, colossalai-nightly Attempting uninstall: tokenizers Found existing installation: tokenizers 0.19.1 Uninstalling tokenizers-0.19.1: Successfully uninstalled tokenizers-0.19.1 Attempting uninstall: transformers Found existing installation: transformers 4.42.0.dev0 Uninstalling transformers-4.42.0.dev0: Successfully uninstalled transformers-4.42.0.dev0 Successfully installed annotated-types-0.6.0 bcrypt-4.1.3 cfgv-3.4.0 colossalai-nightly-2024.5.18 contexttimer-0.3.3 cryptography-42.0.7 deprecated-1.2.14 distlib-0.3.8 fabric-3.2.2 galore_torch-1.0 google-3.0.0 identify-2.5.36 invoke-2.2.0 msgpack-1.0.8 ninja-1.11.1.1 nodeenv-1.8.0 paramiko-3.4.0 pre-commit-3.7.1 pydantic-2.7.1 pydantic-core-2.18.2 pynacl-1.5.0 ray-2.22.0 tokenizers-0.15.2 transformers-4.36.2 virtualenv-20.26.2 wrapt-1.16.0 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root@71c5383a668b:/app# root@71c5383a668b:/app#
run in container with gpu device param
workaround
Install ColossalAI from a specific commit
ARG VERSION=main
RUN git clone -b ${VERSION} https://github.com/hpcaitech/ColossalAI.git &&
cd ColossalAI &&
git checkout 3e05c07bb8921f2a8f9736b6f6673d4e9f1697d0 &&
BUILD_EXT=1 pip install -v --no-cache-dir . &&
cd .. &&
rm -rf ColossalA
thanks
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
thanks
This is because docker buildkit is not compatible with current cuda extension. You can set export FORCE_CUDA=1 before install colossalai in docker. Or you can disable docker buildkit by setting export DOCKER_BUILDKIT=0