TransformerEngine
TransformerEngine copied to clipboard
Building wheel error during installation
I manually download flash-attn, then use 'pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable' for installation, Received error 'Building wheel for transformer_engine (setup.py)... error'
torch2.2 cuda11.8
(tuling) xx@DESKTOP-UA3C67F:~/ChatTTS$ pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-9lezr884 Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-9lezr884 Running command git checkout -b stable --track origin/stable Switched to a new branch 'stable' Branch 'stable' set up to track remote branch 'stable' from 'origin'. Resolved https://github.com/NVIDIA/TransformerEngine.git to commit c81733f1032a56a817b594c8971a738108ded7d0 Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.7.4) Requirement already satisfied: torch in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.2.2) Requirement already satisfied: flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.4.2) Requirement already satisfied: einops in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (0.8.0) Requirement already satisfied: packaging in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (24.1) Requirement already satisfied: ninja in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (1.11.1.1) Requirement already satisfied: annotated-types>=0.4.0 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (0.7.0) Requirement already satisfied: pydantic-core==2.18.4 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (2.18.4) Requirement already satisfied: typing-extensions>=4.6.1 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (4.11.0) Requirement already satisfied: filelock in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.13.1) Requirement already satisfied: sympy in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (1.12) Requirement already satisfied: networkx in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.2.1) Requirement already satisfied: jinja2 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.1.4) Requirement already satisfied: fsspec in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (2024.6.1) Requirement already satisfied: MarkupSafe>=2.0 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from jinja2->torch->transformer_engine==1.6.0+c81733f) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from sympy->torch->transformer_engine==1.6.0+c81733f) (1.3.0) Building wheels for collected packages: transformer_engine Building wheel for transformer_engine (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [242 lines of output] Could not determine CUDA Toolkit version /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/init.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated. !!
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
!!
dist.fetch_build_eggs(dist.setup_requires)
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/transformer_engine
copying transformer_engine/_version.py -> build/lib.linux-x86_64-cpython-310/transformer_engine
copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine
creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
creating build/lib.linux-x86_64-cpython-310/transformer_engine/common
copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/graph.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
creating build/lib.linux-x86_64-cpython-310/transformer_engine/common/recipe
copying transformer_engine/common/recipe/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common/recipe
creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/rmsnorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
running build_ext
Building CMake extension transformer_engine
Running command /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake -S /tmp/pip-req-build-9lezr884/transformer_engine -B /tmp/pip-req-build-9lezr884/build/cmake -DPython_EXECUTABLE=/home/cx/anaconda3/envs/tuling/bin/python -DPython_INCLUDE_DIR=/home/cx/anaconda3/envs/tuling/include/python3.10 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-9lezr884/build/lib.linux-x86_64-cpython-310 -GNinja
-- The CUDA compiler identification is NVIDIA 11.8.89
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-11.8/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda-11.8/targets/x86_64-linux/include (found version "11.8.89")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- cudnn found at /usr/local/cuda-11.8/lib64/libcudnn.so.
CMake Warning (dev) at /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
The package name passed to `find_package_handle_standard_args` (LIBRARY)
does not match the name of the calling package (CUDNN). This can lead to
problems in calling code that expects `find_package` result variables
(e.g., `_FOUND`) to follow a certain pattern.
Call Stack (most recent call first):
cmake/FindCUDNN.cmake:44 (find_package_handle_standard_args)
CMakeLists.txt:24 (find_package)
This warning is for project developers. Use -Wno-dev to suppress it.
-- Found LIBRARY: /usr/local/cuda-11.8/targets/x86_64-linux/include
-- cuDNN: /usr/local/cuda-11.8/lib64/libcudnn.so
-- cuDNN: /usr/local/cuda-11.8/targets/x86_64-linux/include
-- cudnn_adv_infer found at /usr/local/cuda-11.8/lib64/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/local/cuda-11.8/lib64/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/local/cuda-11.8/lib64/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/local/cuda-11.8/lib64/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/local/cuda-11.8/lib64/libcudnn_ops_train.so.
-- Found Python: /home/cx/anaconda3/envs/tuling/bin/python (found version "3.10.14") found components: Interpreter Development.Module
-- JAX support: OFF
-- Configuring done (9.9s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/pip-req-build-9lezr884/build/cmake
Running command /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake --build /tmp/pip-req-build-9lezr884/build/cmake
[1/32] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
[2/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
/tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
[3/32] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
[4/32] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
[5/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
[6/32] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
[7/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
[8/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/swiglu.cu.o
[9/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/relu.cu.o
[10/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
[11/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o
[12/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
[13/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o
[14/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/rtc.cpp.o
[15/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/system.cpp.o
[16/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
[17/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_rope/fused_rope.cu.o
[18/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/recipe/delayed_scaling.cu.o
[19/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
[20/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
[21/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
[22/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_aligned_causal_masked_softmax.cu.o
[23/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
[24/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
[25/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
[26/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
[27/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
/usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_fp8.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
Killed
[28/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
/usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
Killed
Killed
Killed
[29/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
/usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_f16_max512_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
Killed
Killed
[30/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
[31/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-req-build-9lezr884/setup.py", line 336, in _build_cmake
subprocess.run(command, cwd=build_dir, check=True)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake', '--build', '/tmp/pip-req-build-9lezr884/build/cmake']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-req-build-9lezr884/setup.py", line 617, in <module>
main()
File "/tmp/pip-req-build-9lezr884/setup.py", line 602, in main
setuptools.setup(
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-9lezr884/setup.py", line 368, in run
ext._build_cmake(
File "/tmp/pip-req-build-9lezr884/setup.py", line 338, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake', '--build', '/tmp/pip-req-build-9lezr884/build/cmake']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine Running setup.py clean for transformer_engine Failed to build transformer_engine ERROR: Could not build wheels for transformer_engine, which is required to install pyproject.toml-based projects
How do I install successfully, and is it related to cmake? I would be very grateful if you could give me a detailed answer.
We use Ninja to parallelize the build process and I suspect it's overwhelming your system resources. We're thinking about ways to handle this more gracefully, but for now can you try running with CMAKE_BUILD_PARALLEL_LEVEL=1 in your environment? You may also want to see https://github.com/NVIDIA/TransformerEngine/issues/976#issuecomment-2195745927.
With https://github.com/NVIDIA/TransformerEngine/pull/987, you can control the number of parallel build jobs with the MAX_JOBS environment variable.
Current guidance for disabling parallel builds: https://github.com/NVIDIA/TransformerEngine/issues/1077#issuecomment-2389735640