llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Not able to install Transformer Engine for fp8

Open palash04 opened this issue 1 year ago • 0 comments

I am following fp8 setup from here on bare metal (inside condo env)

But I am getting following error -

      Building CMake extension transformer_engine
      Running command /opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake -S /tmp/pip-req-build-eibanc7t/transformer_engine/common -B /tmp/pip-req-build-eibanc7t/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11 -GNinja
      -- The CUDA compiler identification is NVIDIA 12.4.131
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.4.131")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      CMake Error at /tmp/pip-req-build-eibanc7t/3rdparty/cudnn-frontend/cmake/cuDNN.cmake:3 (find_path):
        Could not find CUDNN_INCLUDE_DIR using the following files: cudnn.h
      Call Stack (most recent call first):
        CMakeLists.txt:40 (include)
      
      
      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 89, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      
      During handling of the above exception, another exception occurred:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 174, in <module>
          setuptools.setup(
        File "/opt/conda/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/setup.py", line 54, in run
          super().run()
        File "/opt/conda/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 115, in run
          ext._build_cmake(
        File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 91, in _build_cmake
          raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer_engine

What could be the possible solution to this?

palash04 avatar Sep 16 '24 14:09 palash04