TransformerEngine Installation failed with cmake error

Hi,

We are testing our new Hopper machines (H800/H100) and trying to use fp8 for training for the first time, but are having trouble installing TransformerEngine. It reports RuntimeError: Error when running CMake: Command '['/usr/local/bin/cmake', '-S', '/tmp/pip-req-build-p6kjladj/transformer_engine', '-B', '/tmp/tmps08o01xi', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-p6kjladj/build/lib.linux-x86_64-cpython-310', '-GNinja']' returned non-zero exit status 1..

We tried to invoke the command outside of pip and it just reports that there are no source directory.

We are trying docker right now but our internet configuration does not let us use docker very conveniently so we usually would prefer not use it. Could you should us where we could find any clues on how we can proceed? Much appreciated.

Aug 03 '23 07:08 RuiWang1998

Hi @RuiWang1998, could you share the command you use for installation and a full error message that you are getting? Thank you!

Aug 03 '23 18:08 ptrendx

Hi @ptrendx, we used both pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable and pip install git+https://github.com/NVIDIA/TransformerEngine.git@main and tried python version from 3.9 to 3.11. Everytime we simply install pytorch==2.0.1 and packaging and then ran the two commands. They both returned the same error

Aug 04 '23 03:08 RuiWang1998

Hi @ptrendx, after a little digging, we think we have located the problem but not sure what's the solution here:

/usr/bin/c++ -Dtransformer_engine_EXPORTS -I/home/rui/TransformerEngine/transformer_engine -I/home/rui/TransformerEngine/transformer_engine/common/include -I/usr/local/cuda-11.8/targets/x86_64-linux/include -I/home/rui/TransformerEngine/transformer_engine/../3rdparty/cudnn-frontend/include -I/tmp/tmp9cj2vyni/common/string_headers -isystem /usr/local/cuda-11.8/include -O3 -DNDEBUG -std=gnu++17 -fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o.d -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o -c /home/rui/TransformerEngine/transformer_engine/common/fused_attn/fused_attn.cpp
In file included from /usr/local/cuda-11.8/include/cuda_fp8.h:350,
                 from /home/rui/TransformerEngine/transformer_engine/common/fused_attn/../common.h:14,
                 from /home/rui/TransformerEngine/transformer_engine/common/fused_attn/fused_attn.cpp:8:
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator short unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:735:16: error: ‘__half2ushort_rz’ was not declared in this scope
  735 |         return __half2ushort_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:744:16: error: ‘__half2uint_rz’ was not declared in this scope
  744 |         return __half2uint_rz(__half(*this));
      |                ^~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator long long unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:753:16: error: ‘__half2ull_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  753 |         return __half2ull_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator short int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:791:16: error: ‘__half2short_rz’ was not declared in this scope
  791 |         return __half2short_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:800:16: error: ‘__half2int_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  800 |         return __half2int_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e5m2::operator long long int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:809:16: error: ‘__half2ll_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
  809 |         return __half2ll_rz(__half(*this));
      |                ^~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator short unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1248:16: error: ‘__half2ushort_rz’ was not declared in this scope
 1248 |         return __half2ushort_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1257:16: error: ‘__half2uint_rz’ was not declared in this scope
 1257 |         return __half2uint_rz(__half(*this));
      |                ^~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator long long unsigned int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1266:16: error: ‘__half2ull_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1266 |         return __half2ull_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator short int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1303:16: error: ‘__half2short_rz’ was not declared in this scope
 1303 |         return __half2short_rz(__half(*this));
      |                ^~~~~~~~~~~~~~~
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1311:16: error: ‘__half2int_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1311 |         return __half2int_rz(__half(*this));
      |                ^~~~~~~~~~~~~
      |                __half2_raw
/usr/local/cuda-11.8/include/cuda_fp8.hpp: In member function ‘__nv_fp8_e4m3::operator long long int() const’:
/usr/local/cuda-11.8/include/cuda_fp8.hpp:1319:16: error: ‘__half2ll_rz’ was not declared in this scope; did you mean ‘__half2_raw’?
 1319 |         return __half2ll_rz(__half(*this));
      |                ^~~~~~~~~~~~
      |                __half2_raw
ninja: build stopped: subcommand failed.

Seems like we are missing some headers, where can we include one?

We have machines with CUDA 11.8 and machines with CUDA 12 and we believe they share the same reason here.

Aug 04 '23 09:08 RuiWang1998

Hi,

Some updates, our machines with H800 can successfully install now but A100 machines cannot yet. H800 machines just needed CUDNN but A100 machines, even after installation of CUDNN, still meets the error above.

Aug 04 '23 11:08 RuiWang1998

Hi, this is a pretty strange error - functions like __half2ushort_rz are declared inside the cuda_fp16.hpp file, which should be in the include directory in your CUDA installation (in this case /usr/local/cuda-11.8/include or /usr/local/cuda-11.8/targets/x86_64-linux/include). Could you confirm that such file exists there?

Aug 07 '23 20:08 ptrendx

Hi, yes it is in /usr/local/cuda-11.8/include and it seems that __half2ushort_rz is declared there.

Aug 08 '23 02:08 RuiWang1998

Any update on this issue?

Aug 31 '23 12:08 MicPie

Hi, @MicPie ,

We have been able to install this with newer commits now. Were you trying on stable releases?

Sep 01 '23 03:09 RuiWang1998

I have the same problem in my workstation with A6000 ada.

raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-hnl1xnl7/transformer_engine', '-B', '/tmp/tmp6vkf06mc', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-hnl1xnl7/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer-engine

@RuiWang1998 Could you help me what should I do? Install CUDNN? Cuda 11.8 pytorch 2.1.0 python 3.11 ubuntu 22.04

Nov 21 '23 09:11 mahdip72

Hi,

You would have to modify setup.py and make it output the actual error message (maybe by manual input of commands in terminal) s.t. we can know exactly what is going on.

Best, Rui On Nov 21, 2023 at 5:05 PM +0800, mahdip72 @.***>, wrote:

I have the same problem in my workstation with A6000 ada.

raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-hnl1xnl7/transformer_engine', '-B', '/tmp/tmp6vkf06mc', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-hnl1xnl7/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer-engine

@RuiWang1998https://github.com/RuiWang1998 Could you help me what should I do? Install CUDNN?

— Reply to this email directly, view it on GitHubhttps://github.com/NVIDIA/TransformerEngine/issues/355#issuecomment-1820503928, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHUU7JFXB74O7EPHGY5HJULYFRVGNAVCNFSM6AAAAAA3CJV7S2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRQGUYDGOJSHA. You are receiving this because you were mentioned.Message ID: @.***>

Nov 21 '23 10:11 RuiWang1998

Hi, @MicPie ,

We have been able to install this with newer commits now. Were you trying on stable releases?

@RuiWang1998 Could you show which release version that you use ? I had the same problems. Thanks.

Feb 19 '24 07:02 liuchangdm

Same issue

File "/aml2/TransformerEngine/setup.py", line 338, in _build_cmake raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/aml/conda/bin/cmake', '-S', '/aml2/TransformerEngine/transformer_engine', '-B', '/aml2/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/aml2/ds2/bin/python', '-DPython_INCLUDE_DIR=/aml2/ds2/include/python3.10', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/aml2/TransformerEngine/build/lib.linux-x86_64-cpython-310', '-GNinja', '-Dpybind11_DIR=/aml2/ds2/lib/python3.10/site-packages/pybind11/share/cmake/pybind11']' returned non-zero exit status 1. [end of output]

Apr 02 '24 05:04 hellangleZ

The CMake error message should already be printed to stderr, although it is somewhat buried within the Python stacktrace from setup.py. It may be helpful to search for "Building CMake extension transformer_engine" within your build logs.

If the error is happening during CMake configuration, it's probably because CUDA or cuDNN are not properly installed. See CUDA instructions at https://github.com/NVIDIA/TransformerEngine/issues/700#issuecomment-1979377899. For cuDNN, make sure CUDNN_PATH is set in your environment.

Apr 02 '24 23:04 timmoon10

I solved this issue by simply use this command

git submodule update --init --recursive

Under the TransformerEngine dir, I hope this might help you.

Apr 28 '24 18:04 BrunoFANG1

I also meet the question. the question details information is :

raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-yvwm9h7r/transformer_engine', '-B', '/tmp/pip-req-build-yvwm9h7r/build/cmake', DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-yvwm9h7r/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

My environment is below: ubuntu 22.04 cuda:11.7 python: 3.11 torch:2.3.1 nvidia driver:535.183.06 Look forward to a solution！

Jul 16 '24 10:07 sfdeggb

I also meet the question. the question details information is :

raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-yvwm9h7r/transformer_engine', '-B', '/tmp/pip-req-build-yvwm9h7r/build/cmake', DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-yvwm9h7r/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

My environment is below: ubuntu 22.04 cuda:11.7 python: 3.11 torch:2.3.1 nvidia driver:535.183.06 Look forward to a solution！

Hello, my friend! You can check if your nvcc is added to environment.

nvcc --version

If error occurs, you may fix it by export PATH=/usr/local/cuda/bin:$PATH or something like this.

Jul 16 '24 10:07 wplf

@wplf yeah! my nvcc is seem ok! the information is below:

ubuntu@ip-172-31-38-93:~$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 Are there any other solutions？

Jul 16 '24 10:07 sfdeggb

compiler

Can you check your cmake version?
You can install cmake by pip install cmake

Jul 16 '24 10:07 wplf

@wplf the cmake version is below:

(yuxunlian) ubuntu@ip-172-31-38-93:~$ cmake --version cmake version 3.22.1 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Is this version appropriate？

Jul 16 '24 10:07 sfdeggb

@wplf the cmake version is below:

(yuxunlian) ubuntu@ip-172-31-38-93:~$ cmake --version cmake version 3.22.1 CMake suite maintained and supported by Kitware (kitware.com/cmake).

Is this version appropriate？

Yes， this is ok。 Sorry， I can't help you anymore.

Jul 16 '24 10:07 wplf

@wplf
it does not matter! Thank you for your reply！

Jul 16 '24 10:07 sfdeggb

Any update on this issue? I'm still getting the same error.

Oct 04 '24 07:10 FidanVural

If you are experiencing an error that looks like RuntimeError: Error when running CMake, then something has failed in the build process (probably a CMake configuration error or a compilation error). Please look through the build logs to find more details or post enough of the build logs so we can figure out what's going on. To print the maximum amount of information during the build process:

cd transformer_engine
pip install -v -v -v .

Some common build errors and fixes:

Uninitialized Git submodules: Run git submodule update --init --recursive.
CMake can't find a C++ compiler: Set CXX in the environment.
CMake can't find CUDA: Set CUDA_PATH in the environment.
CMake can't find cuDNN: Set CUDNN_PATH in the environment.
Invalid dependency versions: Consult TE's requirements. As of TE 1.11, TE requires CUDA 12.0+ and cuDNN 8.1+.
Hang during compilation: Try disabling parallelism in the build process by setting MAX_JOBS=1 and NVTE_BUILD_THREADS_PER_JOB=1 in the environment. See https://github.com/NVIDIA/TransformerEngine/issues/1077#issuecomment-2389735640 for more guidance.

I'll lock this issue to make this comment easier for users to find, but please open a new issue if you are encountering a build error (with enough of the build log for us to help).

Oct 04 '24 18:10 timmoon10