[Bug] 在Ubuntu下编译失败
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
- [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.
Describe the bug
我尝试在ubuntu上编译该项目,但是始终失败,这困扰了我很久。 我参考了[0.2.4版本的安装说明](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/balance-serve.md) ,还有一些第三方的安装教程。我安装的是0.3.2版本的,但是我并没有找到该版本的专用安装说明。
我的设备环境:
- Ubuntu 24
- Intel(R) Xeon(R) Gold 6454S NUMA x2
- gcc 11.4,g++ 11.4, cmake 3.28
- cuda 12.8
我尝试了很多次安装命令,始终有报错。执行的过程中有cmake使用了conda中的环境导致的找不到包、gcc/g++版本问题导致Float相关的报错。在我解决了一系列问题之后依然存在问题。我尝试和ai一起编译项目:
执行安装命令:
export USE_NUMA=1
export PATH=/usr/bin:/usr/local/cuda-12.8/bin:$PATH && export PYTHONPATH=/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages:$PYTHONPATH && env USE_BALANCE_SERVE=1 bash ./install.sh
报错:
[3/3] /usr/local/cuda-12.8/bin/nvcc --generate-dependencies-with-compile --dependency-output /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o.d -I/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/include -I/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda-12.8/include -I/data1/jinjm_data/miniconda3/envs/kt/include/python3.11 -c -c /data1/jinjm_data/dev/k32/ktransformers/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu -o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -Xcompiler -fPIC -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=vLLMMarlin -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
g++ -pthread -B /data1/jinjm_data/miniconda3/envs/kt/compiler_compat -shared -Wl,-rpath,/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath-link,/data1/jinjm_data/miniconda3/envs/kt/lib -L/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath,/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath-link,/data1/jinjm_data/miniconda3/envs/kt/lib -L/data1/jinjm_data/miniconda3/envs/kt/lib /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/binding.o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.o -L/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/lib -L/usr/local/cuda-12.8/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/vLLMMarlin.cpython-311-x86_64-linux-gnu.so
-- Using compiler: /usr/bin/g++-13
-- _GLIBCXX_USE_CXX11_ABI=1
-- Could NOT find benchmark (missing: benchmark_DIR)
-- The following features have been enabled:
* Pull, support for pulling metrics
* pkg-config, generate pkg-config files
-- The following OPTIONAL packages have been found:
* CURL
-- The following REQUIRED packages have been found:
* googlemock-3rdparty
* civetweb-3rdparty
-- The following features have been disabled:
* Push, support for pushing metrics to a push-gateway
* Compression, support for zlib compression of metrics
* IYWU, include-what-you-use
-- The following OPTIONAL packages have not been found:
* benchmark
-- xxHash build type: Release
-- Architecture: x86_64
-- pybind11 v2.14.0 dev1
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/__init__.py", line 1002, in <module>
raise ImportError(
ImportError: Failed to load PyTorch C extensions:
It appears that PyTorch has loaded the `torch/_C` folder
of the PyTorch repository rather than the C extensions which
are expected in the `torch._C` namespace. This can occur when
using the `install` workflow. e.g.
$ python setup.py install && python -c "import torch"
This error can generally be solved using the `develop` workflow
$ python setup.py develop && python -c "import torch" # This should succeed
or by running Python from a different directory.
-- Found PyTorch at:
-- PyTorch: CUDA detected: 12.8
-- PyTorch: CUDA nvcc is: /usr/local/cuda-12.8/bin/nvcc
-- PyTorch: CUDA toolkit directory: /usr/local/cuda-12.8
-- PyTorch: Header version is: 12.8
CMake Warning at /data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:140 (message):
Failed to compute shorthash for libnvrtc.so
Call Stack (most recent call first):
/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:86 (include)
/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
CMakeLists.txt:81 (find_package)
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- USE_CUDSS is set to 0. Compiling without cuDSS support
-- USE_CUFILE is set to 0. Compiling without cuFile support
-- Autodetected CUDA architecture(s): 9.0 9.0 9.0
-- Added CUDA NVCC flags for: -gencode;arch=compute_90,code=sm_90
CMake Warning at /data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
CMakeLists.txt:81 (find_package)
-- _GLIBCXX_USE_CXX11_ABI=1
-- Using aio
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/__init__.py", line 1002, in <module>
raise ImportError(
ImportError: Failed to load PyTorch C extensions:
It appears that PyTorch has loaded the `torch/_C` folder
of the PyTorch repository rather than the C extensions which
are expected in the `torch._C` namespace. This can occur when
using the `install` workflow. e.g.
$ python setup.py install && python -c "import torch"
This error can generally be solved using the `develop` workflow
$ python setup.py develop && python -c "import torch" # This should succeed
or by running Python from a different directory.
-- Found PyTorch at:
CMake Warning at /data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:125 (append_torchlib_if_found)
kvc2/CMakeLists.txt:70 (find_package)
CMake Warning (dev) at kvc2/CMakeLists.txt:77 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
-- prometheus Found!
-- CUDA Found!
-- CUDA Version: 12.8
-- CUDA Toolkit Root: /usr/local/cuda-12.8
-- CMAKE_SOURCE_DIR: /data1/jinjm_data/dev/k32/ktransformers/csrc/balance_serve
-- BUILD_PYTHON_EXT: OFF
-- CMAKE_CXX_FLAGS of PhotonLibOS: -O3 -march=native
-- Checking dependency zlib
-- Will find zlib
-- Checking dependency openssl
-- Will find openssl
-- Checking dependency aio
-- Will find aio
-- Checking dependency curl
-- Will find curl
-- Configuring done (2.3s)
-- Generating done (0.1s)
-- Build files have been written to: /data1/jinjm_data/dev/k32/ktransformers/csrc/balance_serve
Error: could not load cache
CMake args: ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/data1/jinjm_data/dev/k32/ktransformers/build/lib.linux-x86_64-cpython-311/', '-DPYTHON_EXECUTABLE=/data1/jinjm_data/miniconda3/envs/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DKTRANSFORMERS_USE_CUDA=ON', '-D_GLIBCXX_USE_CXX11_ABI=1']
CMake args: ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/data1/jinjm_data/dev/k32/ktransformers/build/lib.linux-x86_64-cpython-311/', '-DPYTHON_EXECUTABLE=/data1/jinjm_data/miniconda3/envs/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DKTRANSFORMERS_USE_CUDA=ON', '-D_GLIBCXX_USE_CXX11_ABI=1', '-DLLAMA_NATIVE=ON', '-DEXAMPLE_VERSION_INFO=0.3.1+cu128torch28fancy']
build_temp: /data1/jinjm_data/dev/k32/ktransformers/csrc/balance_serve/build
Traceback (most recent call last):
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 280, in build_wheel
return _build_backend().build_wheel(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 415, in build_wheel
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 397, in _build_with_temp_dir
self.run_setup()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/build_meta.py", line 313, in run_setup
exec(code, locals())
File "<string>", line 668, in <module>
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
dist.run_commands()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 970, in run_commands
self.run_command(cmd)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
super().run_command(command)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
cmd_obj.run()
File "<string>", line 263, in run
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/command/bdist_wheel.py", line 373, in run
self.run_command("build")
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
super().run_command(command)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
cmd_obj.run()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
self.distribution.run_command(command)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
super().run_command(command)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
cmd_obj.run()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 93, in run
_build_ext.run(self)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
self.build_extensions()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1072, in build_extensions
build_ext.build_extensions(self)
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
self._build_extensions_serial()
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
self.build_extension(ext)
File "<string>", line 594, in build_extension
File "<string>", line 370, in run_command_with_live_tail
File "/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '--build', PosixPath('/data1/jinjm_data/dev/k32/ktransformers/csrc/balance_serve/build'), '--verbose', '--parallel=128']' returned non-zero exit status 1.
error: subprocess-exited-with-error
× Building wheel for ktransformers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /data1/jinjm_data/miniconda3/envs/kt/bin/python3.11 /data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp8ujfs3rl
cwd: /data1/jinjm_data/dev/k32/ktransformers
Building wheel for ktransformers (pyproject.toml) ... error
ERROR: Failed building wheel for ktransformers
Failed to build ktransformers
ERROR: Failed to build installable wheels for some pyproject.toml based projects (ktransformers)
编译失败,问题是cmake构建过程返回了非零退出状态。balance_serve的构建目录存在但是空的。还有没有makefile的报错。
我并不清楚应该如何解决
Reproduction
export USE_NUMA=1 export PATH=/usr/bin:/usr/local/cuda-12.8/bin:$PATH && export PYTHONPATH=/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages:$PYTHONPATH && env USE_BALANCE_SERVE=1 bash ./install.sh
Environment
- Ubuntu 24
- Intel(R) Xeon(R) Gold 6454S NUMA x2
- gcc 11.4,g++ 11.4, cmake 3.28
- cuda 12.8
每次都是在执行第三部的时候出现问题。
[3/3] /usr/local/cuda-12.8/bin/nvcc --generate-dependencies-with-compile --dependency-output /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o.d -I/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/include -I/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/usr/local/cuda-12.8/include -I/data1/jinjm_data/miniconda3/envs/kt/include/python3.11 -c -c /data1/jinjm_data/dev/k32/ktransformers/csrc/custom_marlin/gptq_marlin/gptq_marlin.cu -o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -Xcompiler -fPIC -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=vLLMMarlin -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
g++ -pthread -B /data1/jinjm_data/miniconda3/envs/kt/compiler_compat -shared -Wl,-rpath,/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath-link,/data1/jinjm_data/miniconda3/envs/kt/lib -L/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath,/data1/jinjm_data/miniconda3/envs/kt/lib -Wl,-rpath-link,/data1/jinjm_data/miniconda3/envs/kt/lib -L/data1/jinjm_data/miniconda3/envs/kt/lib -L/usr/lib/x86_64-linux-gnu /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/binding.o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin.o /data1/jinjm_data/dev/k32/ktransformers/build/temp.linux-x86_64-cpython-311/csrc/custom_marlin/gptq_marlin/gptq_marlin_repack.o -L/data1/jinjm_data/miniconda3/envs/kt/lib/python3.11/site-packages/torch/lib -L/usr/local/cuda-12.8/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/vLLMMarlin.cpython-311-x86_64-linux-gnu.so
好像是balance_serve模块构建失败。ai尝试单独构建这个模块好像是可以成功,但是一执行install.sh就失败。
首先执行XXX bash install.sh 2>&1 | tee build.log,将编译过程中的所有输出写入到build.log文件中,构建结束后,使用cat build.log | grep error查看哪里出错了,如果看不出来,则尝试将error改为其他关键词找错,比如not found。解决了错误后,一定要将balance_serve构建的缓存文件全部清除,即手动删除csrc/balance_serve/build文件夹,也可以将“rm -rf csrc/balance_serve/build”这一句写入install.sh文件中。官方写的install.sh并没有清理balance_serve的编译缓存,不知道官方为什么不写,还得各种给官方debug,浪费了好多时间。