ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Bug] Error occurred while building kt-kernel

Open ariable opened this issue 2 months ago • 5 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
  • [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.

Describe the bug

I am following the installation guide from https://lmsys.org/blog/2025-10-22-KTransformers/ My installation steps are as follows:

uv pip install "sglang" --prerelease=allow

Then

git clone https://github.com/kvcache-ai/ktransformers
cd ktransformers
git submodule update --init --recursive
cd kt-kernel
export CPUINFER_CPU_INSTRUCT=AVX2
export CPUINFER_ENABLE_AMX=OFF
uv pip install .

I encountered the following error

  × Failed to build `kt-kernel @ file:///root/sgl-0.5.4-1/ktransformers/kt-kernel`
  ├─▶ The build backend returned an error
  ╰─▶ Call to `setuptools.build_meta.build_wheel` failed (exit status: 1)

      [stdout]
      running bdist_wheel
      running build
      running build_py
      copying python/__init__.py -> build/lib.linux-x86_64-cpython-312/kt_kernel
      copying python/experts.py -> build/lib.linux-x86_64-cpython-312/kt_kernel
      running egg_info
      writing kt_kernel.egg-info/PKG-INFO
      writing dependency_links to kt_kernel.egg-info/dependency_links.txt
      writing requirements to kt_kernel.egg-info/requires.txt
      writing top-level names to kt_kernel.egg-info/top_level.txt
      reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
      writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
      running build_ext
      -- No .git directory found; skipping git hooks installation
      -- Found OpenMP_C: -fopenmp (found version "4.5")
      -- Found OpenMP_CXX: -fopenmp (found version "4.5")
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      -- pybind11 v2.14.0 dev1
      -- Found PythonInterp: /root/.cache/uv/builds-v0/.tmpOdn5Vy/bin/python (found suitable version "3.12.3", minimum
      required is "3.7")
      -- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.12.so
      -- Found OpenMP_C: -fopenmp (found version "4.5")
      -- Found OpenMP_CXX: -fopenmp (found version "4.5")
      -- OpenMP found
      -- ccache found, compilation results will be cached. Disable with LLAMA_CCACHE=OFF.
      -- CMAKE_SYSTEM_PROCESSOR: x86_64
      -- x86 detected
      -- CUDA detected
      -- enabling CUDA
      -- CMake PATH:
      /root/.cache/uv/builds-v0/.tmpOdn5Vy/bin:/root/sgl-0.5.4-1/.venv/bin:/root/.nvm/versions/node/v22.20.0/bin:/root/.cargo/bin:/root/.local/bin:/usr/local/cuda-13.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/wsl/lib
      -- Using clang-format 18.1.3 at /usr/bin/clang-format-18
      -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1")
      -- Checking for one of the modules 'hwloc'
      -- CMAKE_CXX_FLAGS:  -O3 -ffast-math
      -- ARCH_FLAGS: -mf16c;-mfma;-mavx;-mavx2
      -- LTO: disabled
      -- NUMA library found: /usr/lib/x86_64-linux-gnu/libnuma.so - enabling NUMA support
      -- Configuring done (10.9s)
      -- Generating done (0.1s)
      -- Build files have been written to:
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/build/temp.linux-x86_64-cpython-312/cpuinfer_ext_Release
      [  1%] Generating build details from Git
      [  2%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
      [  5%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o
      [  5%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o
      [  6%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o
      [  7%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
      [  9%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/flags.cpp.o
      [ 10%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
      [ 11%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
      -- Found Git: /usr/bin/git (found version "2.43.0")
      [ 13%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
      [ 14%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/sgemm.cpp.o
      [ 15%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
      [ 18%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
      [ 18%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
      [ 19%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
      [ 21%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
      [ 22%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
      [ 23%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
      [ 25%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
      [ 26%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
      [ 26%] Built target build_info
      [ 27%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
      [ 28%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
      [ 30%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
      [ 31%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
      [ 32%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
      [ 34%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
      [ 35%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
      [ 36%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
      [ 38%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
      [ 38%] Built target ggml
      [ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
      [ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o
      [ 42%] Linking CXX static library libggml_static.a
      [ 43%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o
      [ 43%] Built target ggml_static
      [ 44%] Linking CXX static library libllamafile.a
      [ 44%] Built target llamafile
      [ 46%] Linking CXX static library libllama.a
      [ 46%] Built target llama
      [ 48%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
      [ 48%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
      [ 50%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
      [ 51%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o
      [ 52%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
      [ 55%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
      [ 55%] Building CXX object CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o
      [ 56%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
      [ 57%] Building CXX object CMakeFiles/cpuinfer_ext.dir/cpu_backend/shared_mem_buffer.cpp.o
      [ 59%] Building CXX object CMakeFiles/cpuinfer_ext.dir/cpu_backend/task_queue.cpp.o
      [ 60%] Building CXX object CMakeFiles/cpuinfer_ext.dir/cpu_backend/worker_pool.cpp.o
      [ 61%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/mlp.cpp.o
      [ 63%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/llamafile/linear.cpp.o
      [ 64%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
      [ 67%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/flags.cpp.o
      [ 67%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
      [ 68%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
      [ 69%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/sgemm.cpp.o
      [ 71%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
      [ 72%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
      [ 73%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
      [ 75%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
      [ 76%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
      [ 77%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
      [ 78%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
      [ 80%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
      [ 81%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
      [ 82%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
      [ 84%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
      [ 85%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
      [ 86%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
      [ 88%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
      [ 89%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
      [ 90%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
      [ 92%] Building CXX object CMakeFiles/cpuinfer_ext.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
      [ 93%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_attn.cpp.o
      [ 94%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o
      [ 96%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_read_write.cpp.o
      [ 97%] Building CXX object CMakeFiles/cpuinfer_ext.dir/operators/kvcache/kvcache_utils.cpp.o
      [ 98%] Linking CXX static library libcommon.a
      [ 98%] Built target common
      -- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
      -- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
      -- CMake configure args:
          -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/root/sgl-0.5.4-1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-312/
          -DPYTHON_EXECUTABLE=/root/.cache/uv/builds-v0/.tmpOdn5Vy/bin/python
          -DCMAKE_BUILD_TYPE=Release
          -DLLAMA_NATIVE=OFF
          -DLLAMA_FMA=ON
          -DLLAMA_F16C=ON
          -DLLAMA_AVX=ON
          -DLLAMA_AVX2=ON
          -DKTRANSFORMERS_CPU_USE_AMX=OFF
          -DKTRANSFORMERS_USE_CUDA=ON
      -- CMake build args: --build . --config Release --parallel 16

      [stderr]
      /root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/config/_apply_pyprojecttoml.py:82:
      SetuptoolsWarning: `license` overwritten by `pyproject.toml`
        corresp(dist, value, root_dir)
      CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required):
        Compatibility with CMake < 3.10 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
        to tell CMake that the project requires at least <min> but has been updated
        to work with policies introduced by <max> or earlier.


      In file included from /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx_kernels.hpp:11,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx.hpp:28,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/awq-moe.hpp:33,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/ext_bindings.cpp:28:
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx_quantization.hpp: In function ‘__m512i
      amx::copy8x64(const int8_t*)’:
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx_quantization.hpp:185:41: warning: AVX512F vector
      return without AVX512F enabled changes the ABI [-Wpsabi]
        185 | inline __m512i copy8x64(const int8_t* qs) { return _mm512_load_si512((const __m512i*)qs); }
            |                                         ^
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx.hpp: In function ‘__m512 amx::act_fn(__m512,
      __m512)’:
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx.hpp:59:22: note: the ABI for passing parameters
      with 64-byte alignment has changed in GCC 4.6
         59 | static inline __m512 act_fn(__m512 gate_val, __m512 up_val) {
            |                      ^~~~~~
      In file included from /usr/lib/gcc/x86_64-linux-gnu/13/include/immintrin.h:53,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/third_party/llama.cpp/ggml-impl.h:451,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/cpu_backend/cpuinfer.h:30,
                       from /root/sgl-0.5.4-1/ktransformers/kt-kernel/ext_bindings.cpp:11:
      /usr/lib/gcc/x86_64-linux-gnu/13/include/avx512fintrin.h: In function ‘void avx512_copy_32xbf16(__m512i*,
      __m512i*)’:
      /usr/lib/gcc/x86_64-linux-gnu/13/include/avx512fintrin.h:6532:1: error: inlining failed in call to
      ‘always_inline’ ‘void _mm512_storeu_si512(void*, __m512i)’: target specific option mismatch
       6532 | _mm512_storeu_si512 (void *__P, __m512i __A)
            | ^~~~~~~~~~~~~~~~~~~
      In file included from /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/amx.hpp:23:
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/utils.hpp:7:22: note: called from here
          7 |   _mm512_storeu_si512(dst, _mm512_loadu_si512(src));
            |   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      /usr/lib/gcc/x86_64-linux-gnu/13/include/avx512fintrin.h:6499:1: error: inlining failed in call to
      ‘always_inline’ ‘__m512i _mm512_loadu_si512(const void*)’: target specific option mismatch
       6499 | _mm512_loadu_si512 (void const *__P)
            | ^~~~~~~~~~~~~~~~~~
      /root/sgl-0.5.4-1/ktransformers/kt-kernel/operators/amx/la/utils.hpp:7:22: note: called from here
          7 |   _mm512_storeu_si512(dst, _mm512_loadu_si512(src));
            |   ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      gmake[2]: *** [CMakeFiles/cpuinfer_ext.dir/build.make:79: CMakeFiles/cpuinfer_ext.dir/ext_bindings.cpp.o] Error
      1
      gmake[2]: *** Waiting for unfinished jobs....
      gmake[1]: *** [CMakeFiles/Makefile2:265: CMakeFiles/cpuinfer_ext.dir/all] Error 2
      gmake: *** [Makefile:136: all] Error 2
      Traceback (most recent call last):
        File "<string>", line 11, in <module>
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/build_meta.py", line 432,
      in build_wheel
          return _build(['bdist_wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/build_meta.py", line 423,
      in _build
          return self._build_with_temp_dir(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/build_meta.py", line 404,
      in _build_with_temp_dir
          self.run_setup()
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/build_meta.py", line 317,
      in run_setup
          exec(code, locals())
        File "<string>", line 213, in <module>
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/__init__.py", line 115,
      in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/core.py", line
      186, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/core.py", line
      202, in run_commands
          dist.run_commands()
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line
      1002, in run_commands
          self.run_command(cmd)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in
      run_command
          super().run_command(command)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line
      1021, in run_command
          cmd_obj.run()
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/command/bdist_wheel.py",
      line 370, in run
          self.run_command("build")
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line
      357, in run_command
          self.distribution.run_command(command)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in
      run_command
          super().run_command(command)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line
      1021, in run_command
          cmd_obj.run()
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/command/build.py",
      line 135, in run
          self.run_command(cmd_name)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line
      357, in run_command
          self.distribution.run_command(command)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/dist.py", line 1102, in
      run_command
          super().run_command(command)
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line
      1021, in run_command
          cmd_obj.run()
        File "<string>", line 101, in run
        File "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/command/build_ext.py", line
      96, in run
          _build_ext.run(self)
        File
      "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py",
      line 368, in run
          self.build_extensions()
        File
      "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py",
      line 484, in build_extensions
          self._build_extensions_serial()
        File
      "/root/.cache/uv/builds-v0/.tmpOdn5Vy/lib/python3.12/site-packages/setuptools/_distutils/command/build_ext.py",
      line 510, in _build_extensions_serial
          self.build_extension(ext)
        File "<string>", line 189, in build_extension
        File "/usr/lib/python3.12/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--config', 'Release', '--parallel', '16']'
      returned non-zero exit status 2.

      hint: This usually indicates a problem with the package or the build environment.

Could this error be due to an incorrect environment configuration? Thanks in advance!

Reproduction

git clone https://github.com/kvcache-ai/ktransformers
cd ktransformers
git submodule update --init --recursive
cd kt-kernel
export CPUINFER_CPU_INSTRUCT=AVX2
export CPUINFER_ENABLE_AMX=OFF
uv pip install .

Environment

My environment is as follows:

OS: Ubuntu 24.04.3
Python: 3.12.3
SGLang: 0.5.4.post1
gcc: 13.3.0
cmake: 3.28.3
CUDA: 13.0

ariable avatar Oct 29 '25 15:10 ariable

Do you have avx512 support? Seems our kernel is for amx/avx512. So you may use lscpu and check if avx512 is available. But we are writing support for more general architectures on x86, we will add amd's version of blis library to help, but now we only support amx/avx512 on x86 platform. And we are also going to fix the former llamafile support for multi-platform (but its performance for prefill is poor).

KMSorSMS avatar Oct 30 '25 12:10 KMSorSMS

Do you have avx512 support? Seems our kernel is for amx/avx512. So you may use lscpu and check if avx512 is available. But we are writing support for more general architectures on x86, we will add amd's version of blis library to help, but now we only support amx/avx512 on x86 platform. And we are also going to fix the former llamafile support for multi-platform (but its performance for prefill is poor).

Thank you.I’m using a Xeon Platinum 8458P processor that supports the AMX instruction set. I found it compiles successfully when I don’t set the CPUINFER_CPU_INSTRUCT environment variable. However, once I set it, the same error occurs. Setting the CPUINFER_ENABLE_AMX environment variable to either ON or OFF doesn’t affect the compilation.

ariable avatar Oct 30 '25 16:10 ariable

Do you have avx512 support? Seems our kernel is for amx/avx512. So you may use lscpu and check if avx512 is available. But we are writing support for more general architectures on x86, we will add amd's version of blis library to help, but now we only support amx/avx512 on x86 platform. And we are also going to fix the former llamafile support for multi-platform (but its performance for prefill is poor).

Thank you.I’m using a Xeon Platinum 8458P processor that supports the AMX instruction set. I found it compiles successfully when I don’t set the CPUINFER_CPU_INSTRUCT environment variable. However, once I set it, the same error occurs. Setting the CPUINFER_ENABLE_AMX environment variable to either ON or OFF doesn’t affect the compilation.

export CPUINFER_CPU_INSTRUCT=AVX2
export CPUINFER_ENABLE_AMX=OFF

I know the reason. You disable amx and set avx2 is true. But our amx/avx kernel should enable at least avx512bf16, so you can try:

export CPUINFER_CPU_INSTRUCT=NATIVE
export CPUINFER_ENABLE_AMX=OFF

and check if it is all right. For avx2, as I said, we will fix the former llamafile support for avx2 and other low-end CPUs. Also, I am going to integrate those config dependencies to avoid this kind of misuse.

KMSorSMS avatar Oct 31 '25 03:10 KMSorSMS

Do you have avx512 support? Seems our kernel is for amx/avx512. So you may use lscpu and check if avx512 is available. But we are writing support for more general architectures on x86, we will add amd's version of blis library to help, but now we only support amx/avx512 on x86 platform. And we are also going to fix the former llamafile support for multi-platform (but its performance for prefill is poor).

Thank you.I’m using a Xeon Platinum 8458P processor that supports the AMX instruction set. I found it compiles successfully when I don’t set the CPUINFER_CPU_INSTRUCT environment variable. However, once I set it, the same error occurs. Setting the CPUINFER_ENABLE_AMX environment variable to either ON or OFF doesn’t affect the compilation.

export CPUINFER_CPU_INSTRUCT=AVX2
export CPUINFER_ENABLE_AMX=OFF

I know the reason. You disable amx and set avx2 is true. But our amx/avx kernel should enable at least avx512bf16, so you can try:

export CPUINFER_CPU_INSTRUCT=NATIVE
export CPUINFER_ENABLE_AMX=OFF

and check if it is all right. For avx2, as I said, we will fix the former llamafile support for avx2 and other low-end CPUs. Also, I am going to integrate those config dependencies to avoid this kind of misuse.

I ran the following command and got the same error.

export CPUINFER_ENABLE_AMX=OFF
export CPUINFER_CPU_INSTRUCT=AVX512

Unless I ran

export CPUINFER_ENABLE_AMX=OFF
export CPUINFER_CPU_INSTRUCT=

it compiles successfully. By the way, some old CPUs support AVX512 but lack AVX512BF16 support. Does that mean they are not supported? Thank you very much.

ariable avatar Nov 01 '25 12:11 ariable

I am working on fixing this. I have tested the different sets:

export CPUINFER_ENABLE_AMX=OFF
export CPUINFER_CPU_INSTRUCT=AVX512
export CPUINFER_CPU_INSTRUCT=AVX2
export CPUINFER_ENABLE_AMX=OFF
export CPUINFER_ENABLE_AMX=ON
export CPUINFER_ENABLE_AVX512=ON

All build well. You can now test the pr on your machine. It's welcome that you find something wrong and inform me to fix or add to improve this pr.

KMSorSMS avatar Nov 02 '25 05:11 KMSorSMS