TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Installation errors on Ampere GPUs

Open realAsma opened this issue 2 years ago • 3 comments

Is there a TransformerEngine source version I can use to install on Ampere GPUs? It seems that the default installation requires CUDA with FP8 which is not supported on Ampere GPUs. - Please correct me if I am wrong.

realAsma avatar Feb 09 '23 00:02 realAsma

I assume the error you encountered was lack of cuda_fp8.h include file - this most probably means that your CUDA Toolkit installation is too old. Transformer Engine requires CUDA 11.8 and you can use it on Ampere (FP8-specific features are limited to Hopper, but everything else should work fine).

ptrendx avatar Feb 09 '23 17:02 ptrendx

I'm trying to install it on an A100 and this is the error I get:

error: identifier "CUDNN_DATA_FP8_E5M2" is undefined

Running: NVTE_FRAMEWORK=pytorch pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable

  ERROR: Command errored out with exit status 1:
   command: /home/carlos/venv/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-0ot3mcum/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-0ot3mcum/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-g7dds3tg
       cwd: /tmp/pip-req-build-0ot3mcum/
  Complete output (162 lines):
  /home/carlos/venv/lib/python3.8/site-packages/setuptools/dist.py:473: UserWarning: Normalizing '0.8.0
  ' to '0.8.0'
    warnings.warn(
  running bdist_wheel
  running build
  running build_py
  Generating grammar tables from /usr/lib/python3.8/lib2to3/Grammar.txt
  Generating grammar tables from /usr/lib/python3.8/lib2to3/PatternGrammar.txt
  creating build
  creating build/lib.linux-x86_64-3.8
  creating build/lib.linux-x86_64-3.8/transformer_engine
  copying transformer_engine/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine
  creating build/lib.linux-x86_64-3.8/transformer_engine/common
  copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
  copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
  copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-3.8/transformer_engine/common
  creating build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-3.8/transformer_engine/pytorch
  creating build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax
  creating build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/utils.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/softmax.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/jit.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/constants.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  copying transformer_engine/tensorflow/fp8.py -> build/lib.linux-x86_64-3.8/transformer_engine/tensorflow
  creating build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-3.8/transformer_engine/jax/flax
  running build_ext
  Building CMake extensions!
  Running CMake in build/temp.linux-x86_64-3.8/Release:
  cmake /tmp/pip-req-build-0ot3mcum/transformer_engine -DCMAKE_BUILD_TYPE=Release -DCMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE=/tmp/pip-req-build-0ot3mcum/build/lib.linux-x86_64-3.8 -GNinja
  cmake --build . --config Release
  -- The CUDA compiler identification is NVIDIA 11.8.89
  -- The CXX compiler identification is GNU 9.4.0
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
  -- Looking for pthread_create in pthreads
  -- Looking for pthread_create in pthreads - not found
  -- Looking for pthread_create in pthread
  -- Looking for pthread_create in pthread - found
  -- Found Threads: TRUE
  -- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
  -- cudnn_adv_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.
  -- cudnn_adv_train found at /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.
  -- cudnn_cnn_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.
  -- cudnn_cnn_train found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.
  -- cudnn_ops_infer found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.
  -- cudnn_ops_train found at /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.
  -- Found CUDNN: /usr/include
  -- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
  -- cuDNN: /usr/include
  -- Found Python: /home/carlos/venv/bin/python3 (found version "3.8.10") found components: Interpreter Development Development.Module Development.Embed
  -- Configuring done
  -- Generating done
  -- Build files have been written to: /tmp/pip-req-build-0ot3mcum/build/temp.linux-x86_64-3.8/Release
  [1/21] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
  [2/21] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
  [3/21] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
  [4/21] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
  [5/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
  FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
  /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-0ot3mcum/transformer_engine -I/tmp/pip-req-build-0ot3mcum/transformer_engine/common/include -I/usr/local/cuda/targets/x86_64-linux/include -I/tmp/pip-req-build-0ot3mcum/transformer_engine/../3rdparty/cudnn-frontend/include -isystem=/usr/local/cuda/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] --generate-code=arch=compute_90,code=[compute_90,sm_90] -Xcompiler=-fPIC -std=c++17 -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o.d -x cu -c /tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu(160): error: identifier "CUDNN_DATA_FP8_E4M3" is undefined
  
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu(162): error: identifier "CUDNN_DATA_FP8_E5M2" is undefined
  
  2 errors detected in the compilation of "/tmp/pip-req-build-0ot3mcum/transformer_engine/common/fused_attn/utils.cu".
  [6/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
  [7/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
  [8/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
  [9/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
  [10/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
  [11/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
  [12/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
  [13/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
  
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
  
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
  
  /tmp/pip-req-build-0ot3mcum/transformer_engine/common/transpose/transpose_fusion.cu(296): warning #177-D: variable "valid_store" was declared but never referenced
  
  [14/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
  [15/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
  [16/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
  [17/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
  [18/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
  [19/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  [20/21] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
  ninja: build stopped: subcommand failed.

For future readers, I also had to add this to my .bashrc:

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

carmocca avatar May 23 '23 15:05 carmocca

Upgrading to CUDA 12.1 allowed me to install it.

Perhaps this comment is outdated and should be updated in the installation instructions

Transformer Engine requires CUDA 11.8

carmocca avatar May 25 '23 01:05 carmocca