ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: install from source error and pip install error too

Open denofiend opened this issue 1 year ago • 0 comments

🐛 Describe the bug

(python3.10) [ColossalAI]BUILD_EXT=1 pip install . Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Processing /home/alsc/ColossalAI Preparing metadata (setup.py) ... done Requirement already satisfied: numpy in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (1.22.4) Requirement already satisfied: tqdm in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (4.66.2) Requirement already satisfied: psutil in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (5.9.8) Requirement already satisfied: packaging in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (24.0) Requirement already satisfied: pre-commit in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (3.6.2) Requirement already satisfied: rich in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (13.7.1) Requirement already satisfied: click in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (8.1.7) Requirement already satisfied: fabric in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (3.2.2) Requirement already satisfied: contexttimer in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (0.3.3) Requirement already satisfied: ninja in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (1.11.1.1) Requirement already satisfied: torch>=1.12 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (2.0.1) Requirement already satisfied: safetensors in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (0.4.2) Requirement already satisfied: einops in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (0.7.0) Requirement already satisfied: pydantic in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (2.6.3) Requirement already satisfied: ray in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (2.9.3) Requirement already satisfied: sentencepiece in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (0.2.0) Requirement already satisfied: google in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (3.0.0) Requirement already satisfied: protobuf in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from colossalai==0.3.6) (4.25.3) Requirement already satisfied: filelock in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from torch>=1.12->colossalai==0.3.6) (3.13.1) Requirement already satisfied: typing-extensions in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from torch>=1.12->colossalai==0.3.6) (4.10.0) Requirement already satisfied: sympy in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from torch>=1.12->colossalai==0.3.6) (1.12) Requirement already satisfied: networkx in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from torch>=1.12->colossalai==0.3.6) (3.1) Requirement already satisfied: jinja2 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from torch>=1.12->colossalai==0.3.6) (3.1.3) Requirement already satisfied: invoke>=2.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from fabric->colossalai==0.3.6) (2.2.0) Requirement already satisfied: paramiko>=2.4 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from fabric->colossalai==0.3.6) (3.4.0) Requirement already satisfied: decorator>=5 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from fabric->colossalai==0.3.6) (5.1.1) Requirement already satisfied: deprecated>=1.2 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from fabric->colossalai==0.3.6) (1.2.14) Requirement already satisfied: beautifulsoup4 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from google->colossalai==0.3.6) (4.12.3) Requirement already satisfied: cfgv>=2.0.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pre-commit->colossalai==0.3.6) (3.4.0) Requirement already satisfied: identify>=1.0.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pre-commit->colossalai==0.3.6) (2.5.35) Requirement already satisfied: nodeenv>=0.11.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pre-commit->colossalai==0.3.6) (1.8.0) Requirement already satisfied: pyyaml>=5.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pre-commit->colossalai==0.3.6) (6.0.1) Requirement already satisfied: virtualenv>=20.10.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pre-commit->colossalai==0.3.6) (20.25.1) Requirement already satisfied: annotated-types>=0.4.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pydantic->colossalai==0.3.6) (0.6.0) Requirement already satisfied: pydantic-core==2.16.3 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from pydantic->colossalai==0.3.6) (2.16.3) Requirement already satisfied: jsonschema in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from ray->colossalai==0.3.6) (4.21.1) Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from ray->colossalai==0.3.6) (1.0.8) Requirement already satisfied: aiosignal in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from ray->colossalai==0.3.6) (1.3.1) Requirement already satisfied: frozenlist in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from ray->colossalai==0.3.6) (1.4.1) Requirement already satisfied: requests in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from ray->colossalai==0.3.6) (2.31.0) Requirement already satisfied: markdown-it-py>=2.2.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from rich->colossalai==0.3.6) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from rich->colossalai==0.3.6) (2.17.2) Requirement already satisfied: wrapt<2,>=1.10 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from deprecated>=1.2->fabric->colossalai==0.3.6) (1.16.0) Requirement already satisfied: mdurl~=0.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->colossalai==0.3.6) (0.1.2) Requirement already satisfied: setuptools in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from nodeenv>=0.11.1->pre-commit->colossalai==0.3.6) (68.2.2) Requirement already satisfied: bcrypt>=3.2 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from paramiko>=2.4->fabric->colossalai==0.3.6) (4.1.2) Requirement already satisfied: cryptography>=3.3 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from paramiko>=2.4->fabric->colossalai==0.3.6) (42.0.5) Requirement already satisfied: pynacl>=1.5 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from paramiko>=2.4->fabric->colossalai==0.3.6) (1.5.0) Requirement already satisfied: distlib<1,>=0.3.7 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from virtualenv>=20.10.0->pre-commit->colossalai==0.3.6) (0.3.8) Requirement already satisfied: platformdirs<5,>=3.9.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from virtualenv>=20.10.0->pre-commit->colossalai==0.3.6) (4.2.0) Requirement already satisfied: soupsieve>1.2 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from beautifulsoup4->google->colossalai==0.3.6) (2.5) Requirement already satisfied: MarkupSafe>=2.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from jinja2->torch>=1.12->colossalai==0.3.6) (2.1.1) Requirement already satisfied: attrs>=22.2.0 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from jsonschema->ray->colossalai==0.3.6) (23.2.0) Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from jsonschema->ray->colossalai==0.3.6) (2023.12.1) Requirement already satisfied: referencing>=0.28.4 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from jsonschema->ray->colossalai==0.3.6) (0.33.0) Requirement already satisfied: rpds-py>=0.7.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from jsonschema->ray->colossalai==0.3.6) (0.18.0) Requirement already satisfied: charset-normalizer<4,>=2 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from requests->ray->colossalai==0.3.6) (2.0.4) Requirement already satisfied: idna<4,>=2.5 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from requests->ray->colossalai==0.3.6) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from requests->ray->colossalai==0.3.6) (2.1.0) Requirement already satisfied: certifi>=2017.4.17 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from requests->ray->colossalai==0.3.6) (2024.2.2) Requirement already satisfied: mpmath>=0.19 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from sympy->torch>=1.12->colossalai==0.3.6) (1.3.0) Requirement already satisfied: cffi>=1.12 in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from cryptography>=3.3->paramiko>=2.4->fabric->colossalai==0.3.6) (1.16.0) Requirement already satisfied: pycparser in /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages (from cffi>=1.12->cryptography>=3.3->paramiko>=2.4->fabric->colossalai==0.3.6) (2.21) Building wheels for collected packages: colossalai Building wheel for colossalai (setup.py) ... /

running build_ext /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.0) has a minor version mismatch with the version that was used to compile PyTorch (11.8). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) building 'colossalai.C.cpu_adam_x86' extension creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310 creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc creating /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda Emitting ninja build file /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/1] c++ -MMD -MF /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.o.d -pthread -B /home/alsc/anaconda3/envs/python3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/alsc/anaconda3/envs/python3.10/include -fPIC -O2 -isystem /home/alsc/anaconda3/envs/python3.10/include -fPIC -I/home/alsc/ColossalAI/extensions/csrc/includes -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/TH -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/include/python3.10 -c -c /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.cpp -o /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -std=c++14 -std=c++17 -lcudart -lcublas -g -Wno-reorder -fopenmp -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=cpu_adam_x86 -D_GLIBCXX_USE_CXX11_ABI=0 /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.cpp:237: warning: ignoring #pragma unroll [-Wunknown-pragmas] 237 | #pragma unroll 4 | /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.cpp:352: warning: ignoring #pragma unroll [-Wunknown-pragmas] 352 | #pragma unroll 8 | In file included from /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/Exceptions.h:14, from /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11, from /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/extension.h:6, from /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.h:29, from /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.cpp:22: /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class<Adam_Optimizer>’: /home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.cpp:443:51: required from here /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class<Adam_Optimizer>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes] 1479 | class class : public detail::generic_type { | ^~~~~~ g++ -pthread -B /home/alsc/anaconda3/envs/python3.10/compiler_compat -shared -Wl,-rpath,/home/alsc/anaconda3/envs/python3.10/lib -Wl,-rpath-link,/home/alsc/anaconda3/envs/python3.10/lib -L/home/alsc/anaconda3/envs/python3.10/lib -Wl,-rpath,/home/alsc/anaconda3/envs/python3.10/lib -Wl,-rpath-link,/home/alsc/anaconda3/envs/python3.10/lib -L/home/alsc/anaconda3/envs/python3.10/lib /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/cpu_adam.o -L/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/colossalai/C/cpu_adam_x86.cpython-310-x86_64-linux-gnu.so building 'colossalai.C.layernorm_cuda' extension Emitting ninja build file /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/TH -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/include/python3.10 -c -c /home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda_kernel.cu -o /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -maxrregcount=50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=layernorm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 FAILED: /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda_kernel.o /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/TH -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/include/python3.10 -c -c /home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda_kernel.cu -o /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 --use_fast_math -maxrregcount=50 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=layernorm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator!=(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here

  /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
            detected during:
              instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
  (61): here
              instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
  /home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here

  nvcc error   : 'cicc' died due to signal 11 (Invalid memory reference)
  [2/2] c++ -MMD -MF /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda.o.d -pthread -B /home/alsc/anaconda3/envs/python3.10/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/alsc/anaconda3/envs/python3.10/include -fPIC -O2 -isystem /home/alsc/anaconda3/envs/python3.10/include -fPIC -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/TH -I/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/alsc/anaconda3/envs/python3.10/include/python3.10 -c -c /home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda.cpp -o /home/alsc/ColossalAI/build/temp.linux-x86_64-cpython-310/home/alsc/ColossalAI/extensions/csrc/cuda/layer_norm_cuda.o -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=layernorm_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
      subprocess.run(
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/subprocess.py", line 524, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/alsc/ColossalAI/setup.py", line 100, in <module>
      setup(
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
      _build_ext.run(self)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
      build_ext.build_extensions(self)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
      self._build_extensions_serial()
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
      self.build_extension(ext)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 249, in build_extension
      _build_ext.build_extension(self, ext)
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 548, in build_extension
      objects = self.compiler.compile(
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
      _write_ninja_file_and_compile_objects(
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1574, in _write_ninja_file_and_compile_objects
      _run_ninja_build(
    File "/home/alsc/anaconda3/envs/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for colossalai Running setup.py clean for colossalai

Environment

(python3.10) [alsc@ColossalAI]$ conda list

packages in environment at /home/alsc/anaconda3/envs/python3.10:

Name Version Build Channel

_libgcc_mutex 0.1 main defaults absl-py 2.1.0 aiohttp 3.9.3 aiosignal 1.3.1 annotated-types 0.6.0 async-timeout 4.0.3 attrs 23.2.0 av 11.0.0 bcrypt 4.1.2 beautifulsoup4 4.12.3 blas 1.0 mkl defaults bzip2 1.0.8 h7b6447c_0 defaults ca-certificates 2023.12.12 h06a4308_0 defaults certifi 2024.2.2 certifi 2024.2.2 py310h06a4308_0 defaults cffi 1.16.0 cfgv 3.4.0 chardet 5.2.0 charset-normalizer 3.3.2 charset-normalizer 2.0.4 pyhd3eb1b0_0 defaults click 8.1.7 colossalai 0.3.6 contexttimer 0.3.3 cryptography 42.0.5 cuda-cudart 11.8.89 0 nvidia cuda-cupti 11.8.87 0 nvidia cuda-libraries 11.8.0 0 nvidia cuda-nvrtc 11.8.89 0 nvidia cuda-nvtx 11.8.86 0 nvidia cuda-opencl 12.4.99 0 nvidia cuda-runtime 11.8.0 0 nvidia cudatoolkit 11.0.221 h6bb024c_0 nvidia datasets 2.18.0 decorator 5.1.1 Deprecated 1.2.14 diffusers 0.24.0 dill 0.3.8 distlib 0.3.8 einops 0.7.0 fabric 3.2.2 ffmpeg 4.3 hf484d3e_0 pytorch filelock 3.13.1 filelock 3.13.1 py310h06a4308_0 defaults freetype 2.11.0 h70c0345_0 defaults frozenlist 1.4.1 fsspec 2024.2.0 giflib 5.2.1 h7b6447c_0 defaults gmp 6.2.1 h295c915_3 defaults gnutls 3.6.15 he1e5248_0 defaults google 3.0.0 grpcio 1.62.1 huggingface-hub 0.21.4 identify 2.5.35 idna 3.4 py310h06a4308_0 defaults idna 3.6 idna 3.4 importlib_metadata 7.0.2 intel-openmp 2021.4.0 h06a4308_3561 defaults invoke 2.2.0 jinja2 3.1.3 py310h06a4308_0 defaults Jinja2 3.1.3 jpeg 9e h7f8727e_0 defaults jsonschema 4.21.1 jsonschema-specifications 2023.12.1 lame 3.100 h7b6447c_0 defaults lcms2 2.12 h3be6417_0 defaults ld_impl_linux-64 2.38 h1181459_1 defaults libcublas 11.11.3.6 0 nvidia libcufft 10.9.0.58 0 nvidia libcufile 1.9.0.20 0 nvidia libcurand 10.3.5.119 0 nvidia libcusolver 11.4.1.48 0 nvidia libcusparse 11.7.5.86 0 nvidia libffi 3.3 he6710b0_2 defaults libgcc-ng 9.1.0 hdf63c60_0 defaults libiconv 1.16 h7f8727e_2 defaults libidn2 2.3.2 h7f8727e_0 defaults libnpp 11.8.0.86 0 nvidia libnvjitlink 12.1.105 0 nvidia libnvjpeg 11.9.0.86 0 nvidia libpng 1.6.37 hbc83047_0 defaults libstdcxx-ng 9.1.0 hdf63c60_0 defaults libtasn1 4.16.0 h27cfd23_0 defaults libtiff 4.2.0 h2818925_1 defaults libunistring 0.9.10 h27cfd23_0 defaults libuuid 1.0.3 h7f8727e_2 defaults libwebp 1.2.2 h55f646e_0 defaults libwebp-base 1.2.2 h7f8727e_0 defaults lz4-c 1.9.3 h295c915_1 defaults Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.1 MarkupSafe 2.1.5 markupsafe 2.1.1 py310h7f8727e_0 defaults mdurl 0.1.2 mkl 2021.4.0 h06a4308_640 defaults mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 mkl-service 2.4.0 py310h7f8727e_0 defaults mkl_fft 1.3.1 py310hd6ae3a3_0 defaults mkl_random 1.2.2 py310h00e6091_0 defaults mpmath 1.3.0 py310h06a4308_0 defaults mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 ncurses 6.3 h7f8727e_2 defaults nettle 3.7.3 hbbd107a_1 defaults networkx 3.1 py310h06a4308_0 defaults networkx 3.1 networkx 3.2.1 ninja 1.11.1.1 nodeenv 1.8.0 numpy 1.26.4 numpy 1.22.4 numpy 1.22.3 py310hfa59a62_0 defaults numpy-base 1.22.3 py310h9585f30_0 defaults nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.19.3 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 openh264 2.1.1 h4ff587b_0 defaults openssl 1.1.1w h7f8727e_0 defaults packaging 24.0 pandas 2.2.1 paramiko 3.4.0 pillow 9.0.1 py310h22f2fdc_0 defaults pillow 10.2.0 Pillow 9.0.1 pip 23.3.1 py310h06a4308_0 defaults pip 23.3.1 platformdirs 4.2.0 pre-commit 3.6.2 protobuf 4.25.3 psutil 5.9.8 pyarrow 15.0.1 pyarrow-hotfix 0.6 pycparser 2.21 pydantic 2.6.3 pydantic_core 2.16.3 Pygments 2.17.2 PyNaCl 1.5.0 python 3.10.0 h12debd9_5 defaults python-dateutil 2.9.0.post0 pytorch 2.0.1 py3.10_cuda11.8_cudnn8.7.0_0 pytorch pytorch-cuda 11.8 h7e8668a_5 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2024.1 PyYAML 6.0.1 ray 2.9.3 readline 8.1.2 h7f8727e_1 defaults referencing 0.33.0 regex 2023.12.25 requests 2.31.0 py310h06a4308_1 defaults requests 2.31.0 rich 13.7.1 rpds-py 0.18.0 safetensors 0.4.2 sentencepiece 0.2.0 setuptools 68.2.2 setuptools 68.2.2 py310h06a4308_0 defaults six 1.16.0 pyhd3eb1b0_1 defaults soupsieve 2.5 sqlite 3.38.5 hc218d9a_0 defaults sympy 1.12 sympy 1.12 py310h06a4308_0 defaults tensorboard 2.16.2 tensorboard-data-server 0.7.2 timm 0.9.16 tk 8.6.12 h1ccaba5_0 defaults tokenizers 0.15.2 torch 2.0.1 torch 2.2.1 torchaudio 2.0.2 torchaudio 2.0.2 py310_cu118 pytorch torchtriton 2.0.0 py310 pytorch torchvision 0.17.1 torchvision 0.15.2 torchvision 0.15.2 py310_cu118 pytorch tqdm 4.66.2 transformers 4.38.2 triton 2.2.0 triton 2.0.0 typing_extensions 4.10.0 typing_extensions 4.9.0 typing_extensions 4.9.0 py310h06a4308_1 defaults tzdata 2024a h04d1e81_0 defaults tzdata 2024.1 urllib3 2.1.0 py310h06a4308_0 defaults urllib3 2.1.0 urllib3 2.2.1 virtualenv 20.25.1 Werkzeug 3.0.1 wheel 0.41.2 wheel 0.41.2 py310h06a4308_0 defaults wrapt 1.16.0 xxhash 3.4.1 xz 5.2.5 h7f8727e_1 defaults yarl 1.9.4 zipp 3.17.0 zlib 1.2.12 h7f8727e_2 defaults zstd 1.5.2 ha4553b6_0 defaults

denofiend avatar Mar 14 '24 13:03 denofiend