llm-foundry
llm-foundry copied to clipboard
Not able to install Transformer Engine for fp8
I am following fp8 setup from here on bare metal (inside condo env)
But I am getting following error -
Building CMake extension transformer_engine
Running command /opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake -S /tmp/pip-req-build-eibanc7t/transformer_engine/common -B /tmp/pip-req-build-eibanc7t/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python -DPython_INCLUDE_DIR=/opt/conda/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311 -Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11 -GNinja
-- The CUDA compiler identification is NVIDIA 12.4.131
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.4.131")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
CMake Error at /tmp/pip-req-build-eibanc7t/3rdparty/cudnn-frontend/cmake/cuDNN.cmake:3 (find_path):
Could not find CUDNN_INCLUDE_DIR using the following files: cudnn.h
Call Stack (most recent call first):
CMakeLists.txt:40 (include)
-- Configuring incomplete, errors occurred!
Traceback (most recent call last):
File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 89, in _build_cmake
subprocess.run(command, cwd=build_dir, check=True)
File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-req-build-eibanc7t/setup.py", line 174, in <module>
setuptools.setup(
File "/opt/conda/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-eibanc7t/setup.py", line 54, in run
super().run()
File "/opt/conda/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 368, in run
self.run_command("build")
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 132, in run
self.run_command(cmd_name)
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
super().run_command(command)
File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 115, in run
ext._build_cmake(
File "/tmp/pip-req-build-eibanc7t/build_tools/build_ext.py", line 91, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/opt/conda/lib/python3.11/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-eibanc7t/transformer_engine/common', '-B', '/tmp/pip-req-build-eibanc7t/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-eibanc7t/build/lib.linux-x86_64-cpython-311', '-Dpybind11_DIR=/tmp/pip-req-build-eibanc7t/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for transformer_engine
What could be the possible solution to this?