TransformerEngine
TransformerEngine copied to clipboard
question for building wheel for transformer-engine
I don't know why this error occurs, can anyone help me solve this problem. My system environment is ubuntu 20.04, python3.8, cuda11.8.
root@f77be2ea35c2:/workspace/TransformerEngine# pip install . WARNING: Ignoring invalid distribution -ransformer-engine (/opt/conda/lib/python3.8/site-packages) Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Processing /workspace/TransformerEngine Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.8.2) Requirement already satisfied: torch in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (1.13.0a0+d0d6b1f) Requirement already satisfied: flash-attn<=2.0.4,>=1.0.6 in /opt/conda/lib/python3.8/site-packages (from transformer-engine==1.1.0.dev0+64a3d1d) (2.0.4) Requirement already satisfied: einops in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (0.7.0) Requirement already satisfied: packaging in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (21.3) Requirement already satisfied: ninja in /opt/conda/lib/python3.8/site-packages (from flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (1.11.1.1) Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.8/site-packages (from pydantic->transformer-engine==1.1.0.dev0+64a3d1d) (4.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.8/site-packages (from packaging->flash-attn<=2.0.4,>=1.0.6->transformer-engine==1.1.0.dev0+64a3d1d) (3.0.9) Building wheels for collected packages: transformer-engine Building wheel for transformer-engine (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [416 lines of output] /opt/conda/lib/python3.8/site-packages/setuptools/dist.py:490: UserWarning: Normalizing '1.1.0dev+64a3d1d' to '1.1.0.dev0+64a3d1d' warnings.warn(tmpl.format(**locals())) running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/transformer_engine copying transformer_engine/init.py -> build/lib.linux-x86_64-3.8/transformer_engine creating build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax copying transformer_engine/jax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax creating build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle copying transformer_engine/paddle/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch copying transformer_engine/pytorch/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch creating build/lib.linux-x86_64-3.8/transformer_engine/common copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-3.8/transformer_engine/common copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/common copying transformer_engine/common/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/common creating build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis copying transformer_engine/jax/praxis/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/praxis creating build/lib.linux-x86_64-3.8/transformer_engine/jax/flax copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax copying transformer_engine/jax/flax/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax creating build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/paddle/layer creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/cpp_extensions creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/init.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch/module running build_ext Building CMake extension transformer_engine Running command /opt/conda/bin/cmake -S /workspace/TransformerEngine/transformer_engine -B /tmp/tmpc_wa7krl -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/workspace/TransformerEngine/build/lib.linux-x86_64-3.8 -GNinja -Dpybind11_DIR=/opt/conda/lib/python3.8/site-packages/pybind11/share/cmake/pybind11 -- The CUDA compiler identification is NVIDIA 11.8.89 -- The CXX compiler identification is GNU 9.4.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89") -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE -- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so. -- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so. -- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so. -- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so. -- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so. -- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so. -- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so. -- Found CUDNN: /usr/include -- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so -- cuDNN: /usr/include -- Found Python: /opt/conda/bin/python3.8 (found version "3.8.13") found components: Interpreter Development Development.Module Development.Embed -- JAX support: OFF -- Configuring done -- Generating done CMake Warning: Manually-specified variables were not used by the project:
pybind11_DIR
-- Build files have been written to: /tmp/tmpc_wa7krl
Running command /opt/conda/bin/cmake --build /tmp/tmpc_wa7krl
[1/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o
[2/29] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
[3/29] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
[4/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o
[5/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/system.cpp.o
[6/29] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
[7/29] Building CXX object common/CMakeFiles/transformer_engine.dir/util/rtc.cpp.o
[8/29] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
[9/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
/workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
/workspace/TransformerEngine/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used
[10/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
[11/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
[12/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
[13/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
[14/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
[15/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
[16/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
[17/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
[18/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
[19/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
[20/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/swiglu.cu.o
[21/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/relu.cu.o
[22/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
[23/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
[24/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
[25/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
[26/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
[27/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
[28/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
[29/29] Linking CXX shared library common/libtransformer_engine.so
Running command /opt/conda/bin/cmake --install /tmp/tmpc_wa7krl
-- Install configuration: "Release"
-- Installing: /workspace/TransformerEngine/build/lib.linux-x86_64-3.8/./libtransformer_engine.so
-- Set runtime path of "/workspace/TransformerEngine/build/lib.linux-x86_64-3.8/./libtransformer_engine.so" to ""
building 'transformer_engine_extensions' extension
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc
creating /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions
Emitting ninja build file /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
FAILED: /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o
/usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu(69): error: expression must have class type but it has type "uint64_t"
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu(73): error: expression must have class type but it has type "uint64_t"
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
2 errors detected in the compilation of "/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/attention.cu".
[2/11] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
In file included from /workspace/TransformerEngine/transformer_engine/common/util/logging.h:17,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.h:34,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions.h:7,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp:8:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[3/11] c++ -MMD -MF /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.o -O3 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
In file included from /workspace/TransformerEngine/transformer_engine/common/util/logging.h:17,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../common.h:34,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../extensions.h:7,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:7:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
In file included from /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/Exceptions.h:13,
from /opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
from /opt/conda/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../common.h:31,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/../extensions.h:7,
from /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:7:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8TensorMeta>’:
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:84:67: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8TensorMeta>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
1479 | class class_ : public detail::generic_type {
| ^~~~~~
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::DType>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<transformer_engine::DType>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:121:70: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::DType>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8FwdTensors>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<transformer_engine::FP8FwdTensors>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:130:66: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8FwdTensors>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<transformer_engine::FP8BwdTensors>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<transformer_engine::FP8BwdTensors>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:141:66: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<transformer_engine::FP8BwdTensors>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Bias_Type>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<NVTE_Bias_Type>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:149:48: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Bias_Type>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Mask_Type>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<NVTE_Mask_Type>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:154:48: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Mask_Type>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_QKV_Layout>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<NVTE_QKV_Layout>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:159:50: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_QKV_Layout>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<NVTE_Fused_Attn_Backend>’:
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:2134:7: required from ‘class pybind11::enum_<NVTE_Fused_Attn_Backend>’
/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/pybind.cpp:179:66: required from here
/opt/conda/lib/python3.8/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<NVTE_Fused_Attn_Backend>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
[4/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/misc.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[5/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/activation.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[6/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/transpose.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/transpose.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[7/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/gemm.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/gemm.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[8/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/common.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[9/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/cast.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/cast.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[10/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/normalization.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/normalization.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
[11/11] /usr/local/cuda/bin/nvcc -I/workspace/TransformerEngine/transformer_engine/common/include -I/workspace/TransformerEngine/transformer_engine/pytorch/csrc -I/workspace/TransformerEngine/transformer_engine -I/workspace/TransformerEngine/3rdparty/cudnn-frontend/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.8 -c -c /workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/softmax.cu -o /workspace/TransformerEngine/build/temp.linux-x86_64-3.8/workspace/TransformerEngine/transformer_engine/pytorch/csrc/extensions/softmax.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -gencode arch=compute_70,code=sm_70 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT162_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --threads 4 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1013"' -DTORCH_EXTENSION_NAME=transformer_engine_extensions -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h(36): warning #2918-D: fold expressions are nonstandard in this mode
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h: In function ‘std::string transformer_engine::concat_strings(const Ts& ...)’:
/workspace/TransformerEngine/transformer_engine/common/util/../util/string.h:36:37: warning: fold-expressions only available with ‘-std=c++17’ or ‘-std=gnu++17’
36 | (..., (str += to_string_like(args)));
| ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1897, in _run_ninja_build
subprocess.run(
File "/opt/conda/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/workspace/TransformerEngine/setup.py", line 626, in <module>
main()
File "/workspace/TransformerEngine/setup.py", line 611, in main
setuptools.setup(
File "/opt/conda/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.8/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/opt/conda/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/workspace/TransformerEngine/setup.py", line 403, in run
super().run()
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 839, in build_extensions
build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/opt/conda/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/opt/conda/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 654, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1569, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1913, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer-engine Running setup.py clean for transformer-engine Failed to build transformer-engine ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects
Could you try pyTorch 1.14 (or anything 2.x) instead? I believe pyTorch changed the random number generator C++ API between 1.13 and 1.14 which could cause this error.
Same here, but in my case the process stops at:
[28/29] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/scratch_local/pip-req-build-10hsu61s/setup.py", line 353, in _build_cmake
subprocess.run(command, cwd=build_dir, check=True)
File "/leonardo/prod/spack/03/install/0.19/linux-rhel8-icelake/gcc-11.3.0/python-3.10.8-eauysn2mronkqqffs7r6bvftsdpsfm4b/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/cmake', '--build', '/scratch_local/tmpm8306yxu']' returned non-zero exit status 1.
any updates on this? I am facing more or less the same issue. CMAKE fails with an error saying Could NOT find CUDNN (missing: CUDNN_INCLUDE_DIR CUDNN_LIBRARY)
. However, these variables are set:
$ echo $CUDNN_LIBRARY
/mnt/nfs/clustersw/shared/cuda/cudnn-linux-x86_64-8.9.0.131_cuda12-archive/lib/libcudnn.so
$ echo $CUDNN_INCLUDE_DIR
/mnt/nfs/clustersw/shared/cuda/cudnn-linux-x86_64-8.9.0.131_cuda12-archive/include
Please help me solve this problem.
@Mrzhang-dada @ionutmodo @osainz59 Have you solve this problem? I'm facing the same problem.
the environment configuration shows blow:
NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2
nvcc -V : 11.8
cuDNN: 8.9.6
torch 1.13.0+cu116 torchaudio 0.13.0+cu116 torchsummary 1.5.1 torchvision 0.14.0+cu116
Same trouble on 2 machines with Geforce RTX 2080 TI 12GB and Geforce RTX 4090 24GB cards. I spent 3-4 days on testing with different configurations without results .
The environment configuration shows blow:
NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4
nvcc -V : nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Tue_Feb_27_16:19:38_PST_2024 Cuda compilation tools, release 12.4, V12.4.99 Build cuda_12.4.r12.4/compiler.33961263_0
cuDNN: 8.9.6 torch 2.4.1 torchvision 0.19.1
It is a horrible job to try to install TransformerEngine. I need it to develop punctuation and capitalization module.
RuntimeError: Error when running CMake: Command '['/home/deep/.local/lib/python3.10/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-o6ytpyoz/transformer_engine/common', '-B', '/tmp/pip-req-build-o6ytpyoz/build/cmake', '-DPython_EXECUTABLE=/usr/bin/python3', '-DPython_INCLUDE_DIR=/usr/local/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-o6ytpyoz/build/lib.linux-x86_64-cpython-310', '-Dpybind11_DIR=/home/deep/.local/lib/python3.10/site-packages/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
Please try these suggestions: https://github.com/NVIDIA/TransformerEngine/issues/355#issuecomment-2394353816
It may also be worth considering using an NGC PyTorch container, which includes TE.