Installation errors on Ampere GPUs
Is there a TransformerEngine source version I can use to install on Ampere GPUs?
It seems that the default installation requires CUDA with FP8 which is not supported on Ampere GPUs. - Please correct me if I am wrong.
I assume the error you encountered was lack of cuda_fp8.h include file - this most probably means that your CUDA Toolkit installation is too old. Transformer Engine requires CUDA 11.8 and you can use it on Ampere (FP8-specific features are limited to Hopper, but everything else should work fine).
I'm trying to install it on an A100 and this is the error I get:
error: identifier "CUDNN_DATA_FP8_E5M2" is undefined
Running: NVTE_FRAMEWORK=pytorch pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable
ERROR: Command errored out with exit status 1:
command: /home/carlos/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-0ot3mcum/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-0ot3mcum/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-g7dds3tg
cwd: /tmp/pip-req-build-0ot3mcum/
Complete output (162 lines):
/home/carlos/venv/lib/python3.8/site-packages/setuptools/dist.py:473: UserWarning: Normalizing '0.8.0
' to '0.8.0'
warnings.warn(
running bdist_wheel
running build
running build_py
Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/transformer_engine
copying transformer_engine/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine
creating build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
creating build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
creating build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
copying transformer_engine/tensorflow/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
creating build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
running build_ext
Building CMake extensions!
Running CMake in build/temp.linux-x86_64-3.8/Release:
cmake /tmp/pip-req-build-0ot3mcum/transformer_engine -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE=/tmp/pip-req-build-0ot3mcum/build/lib.linux-x86_64-3.8 -GNinja
cmake --build . --config Release
-- The CUDA compiler identification is NVIDIA 11.8.89
-- The CXX compiler identification is GNU 9.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
-- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.
-- Found CUDNN: /usr/include
-- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- cuDNN: /usr/include
-- Found Python: /home/carlos/venv/bin/python3 (found version "3.8.10") found components: Interpreter Development Development.Module Development.Embed
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/pip-req-build-0ot3mcum/build/temp.linux-x86_64-3.8/Release
[1/21] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
[2/21] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
[3/21] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
[4/21] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
[5/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-0ot3mcum/transformer_engine -I/tmp/pip-req-build-0ot3mcum/transformer_engine/common/include -I/usr/local/cuda/targets/x86_64-linux/include -I/tmp/pip-req-build-0ot3mcum/transformer_engine/../3rdparty/cudnn-frontend/include -isystem=/usr/local/cuda/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o.d -x cu -c /tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu(160): error: identifier "CUDNN_DATA_FP8_E4M3" is undefined
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu(162): error: identifier "CUDNN_DATA_FP8_E5M2" is undefined
2 errors detected in the compilation of "/tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu".
[6/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
[7/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
[8/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
[9/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
[10/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
[11/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
[12/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
[13/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
/tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
[14/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
[15/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
[16/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
[17/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
[18/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
[19/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
[20/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
ninja: build stopped: subcommand failed.
For future readers, I also had to add this to my .bashrc:
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin
Upgrading to CUDA 12.1 allowed me to install it.
Perhaps this comment is outdated and should be updated in the installation instructions
Transformer Engine requires CUDA 11.8