TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

ERROR: Failed building wheel for transformer-engine

Open ShabnamRA opened this issue 1 year ago • 4 comments

I am trying to install TransformerEngine using following :

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable facing following error

      RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-wpw9pxi1/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-wpw9pxi1/transformer_engine', '-B', '/tmp/tmps_krasnv', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-wpw9pxi1/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer-engine
  Running setup.py clean for transformer-engine
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects

ShabnamRA avatar Mar 04 '24 12:03 ShabnamRA

It looks like there's a compilation error when building the core C++ library. Can you provide more of the error message so we can figure out where the error is coming from? I wonder if it's that same as https://github.com/NVIDIA/TransformerEngine/issues/694.

timmoon10 avatar Mar 04 '24 19:03 timmoon10

`Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-fgxtbhtl Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-fgxtbhtl Running command git checkout -b stable --track origin/stable Switched to a new branch 'stable' Branch 'stable' set up to track remote branch 'stable' from 'origin'. Resolved https://github.com/NVIDIA/TransformerEngine.git to commit 5b90b7f5ed67b373bc5f843d1ac3b7a8999df08e Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from transformer-engine==1.3.0+5b90b7f) (2.6.3) Requirement already satisfied: torch in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from transformer-engine==1.3.0+5b90b7f) (2.2.1) Collecting flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6 (from transformer-engine==1.3.0+5b90b7f) Using cached flash_attn-2.4.2-cp311-cp311-linux_x86_64.whl Requirement already satisfied: einops in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f) (0.7.0) Requirement already satisfied: packaging in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f) (23.2) Collecting ninja (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer-engine==1.3.0+5b90b7f) Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB) Requirement already satisfied: annotated-types>=0.4.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (0.6.0) Requirement already satisfied: pydantic-core==2.16.3 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (2.16.3) Requirement already satisfied: typing-extensions>=4.6.1 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from pydantic->transformer-engine==1.3.0+5b90b7f) (4.10.0) Requirement already satisfied: filelock in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.13.1) Requirement already satisfied: sympy in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (1.12) Requirement already satisfied: networkx in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.2.1) Requirement already satisfied: jinja2 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (3.1.3) Requirement already satisfied: fsspec in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2024.2.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105) Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (8.9.2.26) Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.3.1) Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (11.0.2.54) Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (10.3.2.106) Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (11.4.5.107) Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.0.106) Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2.19.3) Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (12.1.105) Requirement already satisfied: triton==2.2.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from torch->transformer-engine==1.3.0+5b90b7f) (2.2.0) Requirement already satisfied: nvidia-nvjitlink-cu12 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->transformer-engine==1.3.0+5b90b7f) (12.3.101) Requirement already satisfied: MarkupSafe>=2.0 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from jinja2->torch->transformer-engine==1.3.0+5b90b7f) (2.1.5) Requirement already satisfied: mpmath>=0.19 in ./anaconda3/envs/NeMo/lib/python3.11/site-packages (from sympy->torch->transformer-engine==1.3.0+5b90b7f) (1.3.0) Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB) Building wheels for collected packages: transformer-engine Building wheel for transformer-engine (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [163 lines of output] /home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/init.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated. !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************
  
  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  /home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/torch/utils/cpp_extension.py:500: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-311
  creating build/lib.linux-x86_64-cpython-311/transformer_engine
  copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  running build_ext
  Building CMake extension transformer_engine
  Running command /tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake -S /tmp/pip-req-build-fgxtbhtl/transformer_engine -B /tmp/tmpfzxgbal5 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11
  -- The CUDA compiler identification is unknown
  -- The CXX compiler identification is GNU 11.4.0
  CMake Error at CMakeLists.txt:15 (project):
    No CMAKE_CUDA_COMPILER could be found.
  
    Tell CMake where to find the compiler by setting either the environment
    variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
    path to the compiler, or to the compiler name if it is in the PATH.
  
  
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Configuring incomplete, errors occurred!
  Traceback (most recent call last):
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 353, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/subprocess.py", line 569, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-fgxtbhtl/transformer_engine', '-B', '/tmp/tmpfzxgbal5', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 626, in <module>
      main()
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 611, in main
      setuptools.setup(
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
      self.run_command("build")
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 383, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-fgxtbhtl/setup.py", line 355, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-fgxtbhtl/.eggs/cmake-3.28.3-py3.11-linux-x86_64.egg/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-fgxtbhtl/transformer_engine', '-B', '/tmp/tmpfzxgbal5', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-fgxtbhtl/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer-engine Running setup.py clean for transformer-engine Failed to build transformer-engine ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects `

ShabnamRA avatar Mar 05 '24 08:03 ShabnamRA

CMake is failing since it can't find your CUDA installation. You can reproduce this outside of TE by making a CMakeLists.txt file:

cmake_minimum_required(VERSION 3.18)
project(myproject LANGUAGES CUDA CXX)

Then call cmake . in the directory.

I'd recommend one of the following:

  • Set the CUDA_PATH environment variable with the path to the CUDA installation (something like /usr/local/cuda)
  • Add nvcc to your PATH
  • Set the CUDACXX environment variable with the path to nvcc

Related: https://github.com/NVIDIA/TransformerEngine/issues/383

timmoon10 avatar Mar 05 '24 18:03 timmoon10

I solved this issue by simply use this command

git submodule update --init --recursive

Under the TransformerEngine dir, I hope this might help you.

BrunoFANG1 avatar Apr 28 '24 18:04 BrunoFANG1

I was able to compile using CUDA/PyTorch 12.4 on Ubuntu 24.04. I was not able to compile with PyTorch 12.1 and CUDA 12.5. The docker image uses 12.2 for both, so I assume that works. 12.1 for both might work, but I didn't test it. These compilation errors are usually caused by version mismatch.

Check your PyTorch CUDA version:

python
import torch
torch.version.cuda

Check your cuda-toolkit version:

nvcc --version

You can grab PyTorch 12.4 from the preview here:

https://pytorch.org/get-started/locally/

CUDA Toolkit 12.4 here:

https://developer.nvidia.com/cuda-12-4-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

Make sure to set MAX_JOBS to 1 before compiling (known flash attn issue):

export MAX_JOBS=1

Update your ~/.bashrc with environmental variables:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export CUDA_PATH=/usr/local/cuda
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDACXX=/usr/local/cuda/bin/nvcc
export PATH=/usr/local/cuda/bin/nvcc:$PATH

Then run:

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

Compilation will take a while. Avoid installing with python setup.py install on the source. Install with git+ instead.

nickpotafiy avatar May 29 '24 21:05 nickpotafiy

I fixed this bugs by add export PATH=/usr/local/cuda/bin:$PATH to .bashrc . That cost me one afternoon.

wplf avatar Jul 05 '24 09:07 wplf

For future reference, https://github.com/NVIDIA/TransformerEngine/issues/700#issuecomment-1979377899 provides instructions on installing CUDA so it is available to CMake.

I'll close this issue so this guidance is the last in the thread and is easier for other users to find. Please open a new issue if you run into another CMake issue.

timmoon10 avatar Jul 05 '24 18:07 timmoon10