ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Bug] error with kt-kernel installation

Open jli113 opened this issue 1 month ago • 13 comments

Checklist

  • [x] 1. I have searched related issues but cannot get the expected help.
  • [x] 2. The bug has not been fixed in the latest version.
  • [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
  • [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.

Describe the bug

Processing /home/k1/ktransformers/kt-kernel Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting torch>=2.0.0 (from kt-kernel==0.1.0) Using cached torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB) Collecting safetensors>=0.4.0 (from kt-kernel==0.1.0) Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB) Collecting compressed-tensors>=0.7.0 (from kt-kernel==0.1.0) Using cached compressed_tensors-0.12.2-py3-none-any.whl.metadata (7.0 kB) Collecting numpy>=1.24.0 (from kt-kernel==0.1.0) Using cached numpy-2.3.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) Collecting triton>=2.0.0 (from kt-kernel==0.1.0) Using cached triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB) Collecting black>=25.9.0 (from kt-kernel==0.1.0) Using cached black-25.11.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (85 kB) Collecting click>=8.0.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached click-8.3.0-py3-none-any.whl.metadata (2.6 kB) Collecting mypy-extensions>=0.4.3 (from black>=25.9.0->kt-kernel==0.1.0) Using cached mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) Collecting packaging>=22.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached packaging-25.0-py3-none-any.whl.metadata (3.3 kB) Collecting pathspec>=0.9.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached pathspec-0.12.1-py3-none-any.whl.metadata (21 kB) Collecting platformdirs>=2 (from black>=25.9.0->kt-kernel==0.1.0) Using cached platformdirs-4.5.0-py3-none-any.whl.metadata (12 kB) Collecting pytokens>=0.3.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached pytokens-0.3.0-py3-none-any.whl.metadata (2.0 kB) Collecting transformers (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached transformers-4.57.1-py3-none-any.whl.metadata (43 kB) Collecting pydantic>=2.0 (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pydantic-2.12.4-py3-none-any.whl.metadata (89 kB) Collecting loguru (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached loguru-0.7.3-py3-none-any.whl.metadata (22 kB) Collecting annotated-types>=0.6.0 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB) Collecting pydantic-core==2.41.5 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB) Collecting typing-extensions>=4.14.1 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB) Collecting typing-inspection>=0.4.2 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB) Collecting filelock (from torch>=2.0.0->kt-kernel==0.1.0) Using cached filelock-3.20.0-py3-none-any.whl.metadata (2.1 kB) Collecting sympy>=1.13.3 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB) Collecting networkx>=2.5.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached networkx-3.5-py3-none-any.whl.metadata (6.3 kB) Collecting jinja2 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) Collecting fsspec>=0.8.5 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached fsspec-2025.10.0-py3-none-any.whl.metadata (10 kB) Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cublas-cu12==12.8.4.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cufft-cu12==11.3.3.83 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-curand-cu12==10.3.9.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cusparselt-cu12==0.7.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) Collecting nvidia-nccl-cu12==2.27.5 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) Collecting nvidia-nvshmem-cu12==3.3.20 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) Collecting nvidia-nvtx-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cufile-cu12==1.13.1.3 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB) Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) Using cached markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.7 kB) Collecting huggingface-hub<1.0,>=0.34.0 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB) Collecting pyyaml>=5.1 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.4 kB) Collecting regex!=2019.12.17 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached regex-2025.11.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (40 kB) Collecting requests (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB) Collecting tokenizers<=0.23.0,>=0.22.0 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB) Collecting tqdm>=4.27 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB) Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB) Collecting charset_normalizer<4,>=2 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (37 kB) Collecting idna<4,>=2.5 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached idna-3.11-py3-none-any.whl.metadata (8.4 kB) Collecting urllib3<3,>=1.21.1 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached certifi-2025.10.5-py3-none-any.whl.metadata (2.5 kB) Using cached black-25.11.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.6 MB) Using cached click-8.3.0-py3-none-any.whl (107 kB) Using cached compressed_tensors-0.12.2-py3-none-any.whl (183 kB) Using cached mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) Using cached numpy-2.3.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.9 MB) Using cached packaging-25.0-py3-none-any.whl (66 kB) Using cached pathspec-0.12.1-py3-none-any.whl (31 kB) Using cached platformdirs-4.5.0-py3-none-any.whl (18 kB) Using cached pydantic-2.12.4-py3-none-any.whl (463 kB) Using cached pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB) Using cached pytokens-0.3.0-py3-none-any.whl (12 kB) Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB) Using cached torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl (899.8 MB) Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB) Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB) Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB) Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) Using cached nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB) Using cached nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) Using cached triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (170.4 MB) Using cached fsspec-2025.10.0-py3-none-any.whl (200 kB) Using cached networkx-3.5-py3-none-any.whl (2.0 MB) Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB) Using cached mpmath-1.3.0-py3-none-any.whl (536 kB) Using cached typing_extensions-4.15.0-py3-none-any.whl (44 kB) Using cached typing_inspection-0.4.2-py3-none-any.whl (14 kB) Using cached filelock-3.20.0-py3-none-any.whl (16 kB) Using cached jinja2-3.1.6-py3-none-any.whl (134 kB) Using cached markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB) Using cached loguru-0.7.3-py3-none-any.whl (61 kB) Using cached transformers-4.57.1-py3-none-any.whl (12.0 MB) Using cached huggingface_hub-0.36.0-py3-none-any.whl (566 kB) Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB) Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB) Using cached pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB) Using cached regex-2025.11.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (800 kB) Using cached tqdm-4.67.1-py3-none-any.whl (78 kB) Using cached requests-2.32.5-py3-none-any.whl (64 kB) Using cached charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (151 kB) Using cached idna-3.11-py3-none-any.whl (71 kB) Using cached urllib3-2.5.0-py3-none-any.whl (129 kB) Using cached certifi-2025.10.5-py3-none-any.whl (163 kB) Building wheels for collected packages: kt-kernel Building wheel for kt-kernel (pyproject.toml) ... error error: subprocess-exited-with-error

× Building wheel for kt-kernel (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [162 lines of output] /tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: license overwritten by pyproject.toml corresp(dist, value, root_dir) running bdist_wheel running build running build_py creating build/lib.linux-x86_64-cpython-311/kt_kernel copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel copying python/init.py -> build/lib.linux-x86_64-cpython-311/kt_kernel running egg_info writing kt_kernel.egg-info/PKG-INFO writing dependency_links to kt_kernel.egg-info/dependency_links.txt writing requirements to kt_kernel.egg-info/requires.txt writing top-level names to kt_kernel.egg-info/top_level.txt reading manifest file 'kt_kernel.egg-info/SOURCES.txt' writing manifest file 'kt_kernel.egg-info/SOURCES.txt' running build_ext -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- No .git directory found; skipping git hooks installation -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- CMAKE_CXX_FLAGS: -O3 -ffast-math -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- Found BLIS include at /usr/include/x86_64-linux-gnu -- Found BLIS library /usr/lib/x86_64-linux-gnu/libblis.so -- ARCH_FLAGS: -mfma;-mavx;-mavx2;-march=native CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required): Compatibility with CMake < 3.10 will be removed from a future version of CMake.

    Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
    to tell CMake that the project requires at least <min> but has been updated
    to work with policies introduced by <max> or earlier.


  -- pybind11 v2.14.0 dev1
  -- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7")
  -- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so
  -- Performing Test HAS_FLTO
  -- Performing Test HAS_FLTO - Success
  -- Found Git: /usr/bin/git (found version "2.34.1")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  -- Found Threads: TRUE
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- OpenMP found
  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  -- Looking for a CUDA compiler
  -- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
  -- CUDA detected
  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61")
  -- enabling CUDA
  -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- SOURCE_DIR7:
  CMake Warning at CMakeLists.txt:485 (message):
    clang-format not found.  Please install clang-format (>=18) or pass
    -DCLANG_FORMAT_BIN=/full/path and reconfigure.


  -- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
  CMake Error at CMakeLists.txt:531 (message):
    FindHWLOC needs pkg-config program and PKG_CONFIG_PATH must contain the
    path to hwloc.pc file.


  -- Configuring incomplete, errors occurred!
  -- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
  Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'rdtscp', 'nonstop_tsc', 'bmi2', '3dnowprefetch', 'rdseed', 'wdt', 'pclmulqdq', 'rdpru', 'sha_ni', 'succor', 'cqm_mbm_total', 'xsavec', 'fpu', 'avx', 'perfctr_core', 'aperfmperf', 'avx2', 'xgetbv1', 'rep_good', 'lahf_lm', 'cmp_legacy', 'ssbd', 'v_vmsave_vmload', 'clzero', 'cmov', 'mca', 'monitor', 'mba', 'bpext', 'nopl', 'stibp', 'nrip_save', 'vmmcall', 'sme', 'cx8', 'sep', 'misalignsse', 'topoext', 'clwb', 'clflush', 'cat_l3', 'adx', 'pge', 'mwaitx', 'ibrs', 'npt', 'xsaves', 'cpuid', 'sse4_1', 'lm', 'pni', 'aes', 'perfctr_nb', 'smep', 'lbrv', 'pae', 'sev_es', 'apic', 'svm', 'ibpb', 'syscall', 'mmxext', 'constant_tsc', 'cqm', 'smca', 'msr', 'fxsr', 'tsc', 'pat', 'abm', 'umip', 'vgif', 'fxsr_opt', 'overflow_recov', 'vme', 'avic', 'extd_apicid', 'decodeassists', 'cqm_mbm_local', 'rapl', 'mce', 'pfthreshold', 'tsc_scale', 'pse', 'tce', 'rdrand', 'xsaveerptr', 'sev', 'extapic', 'perfctr_llc', 'smap', 'cqm_occup_llc', 'fma', 'sse', 'popcnt', 'ht', 'cx16', 'ibs', 'flushbyasid', 'wbnoinvd', 'xsaveopt', 'hw_pstate', 'bmi1', 'movbe', 'rdpid', 'svm_lock', 'pausefilter', 'sse4a', 'vmcb_clean', 'osvw', 'v_spec_ctrl', 'arat', 'ibpb_exit_to_user', 'rdt_a', 'mmx', 'cqm_llc', 'mtrr', 'sse4_2', 'nx', 'cpb', 'ssse3', 'cr8_legacy', 'cdp_l3', 'f16c', 'clflushopt', 'skinit', 'xsave', 'irperf', 'sse2', 'fsgsbase', 'pdpe1gb', 'de', 'pse36'}}}
  -- Detected AMD CPU; enabling AMD MoE kernel (-DKTRANSFORMERS_CPU_MOE_AMD=ON)
  -- CPU detection: vendor=amd arch=x86_64 features=['AVX2']
  -- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
  -- CMake configure args:
      -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/
      -DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11
      -DCMAKE_BUILD_TYPE=Release
      -DLLAMA_NATIVE=ON
      -DKTRANSFORMERS_CPU_MOE_AMD=ON
      -DKTRANSFORMERS_USE_CUDA=ON
  Traceback (most recent call last):
    File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
      main()
    File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
      json_out["return_val"] = hook(**hook_input["kwargs"])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 280, in build_wheel
      return _build_backend().build_wheel(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 435, in build_wheel
      return _build(['bdist_wheel', '--dist-info-dir', str(metadata_directory)])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 423, in _build
      return self._build_with_temp_dir(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
      self.run_setup()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
      exec(code, locals())
    File "<string>", line 330, in <module>
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 115, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 186, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
      dist.run_commands()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
      self.run_command(cmd)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
      self.run_command("build")
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
      cmd_obj.run()
    File "<string>", line 106, in run
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 96, in run
      _build_ext.run(self)
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
      self.build_extensions()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 484, in build_extensions
      self._build_extensions_serial()
    File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 510, in _build_extensions_serial
      self.build_extension(ext)
    File "<string>", line 298, in build_extension
    File "/home/k1/miniconda3/envs/kt/lib/python3.11/subprocess.py", line 571, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['cmake', '/home/k1/ktransformers/kt-kernel', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/', '-DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DLLAMA_NATIVE=ON', '-DKTRANSFORMERS_CPU_MOE_AMD=ON', '-DKTRANSFORMERS_USE_CUDA=ON']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for kt-kernel Failed to build kt-kernel error: failed-wheel-build-for-install

× Failed to build installable wheels for some pyproject.toml based projects ╰─> kt-kernel

Reproduction

repo installation:

conda create  -n ktransformers python=3.11
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
cd kt-kernel
pip install .

clang installation:

wget https://apt.llvm.org/llvm.sh 
chmod u+x llvm.sh
sudo ./llvm.sh 18

Environment

Ubuntu 22.04.5 LTS Eight RTX 4000 ADA Single AMD EPYC 7402P

jli113 avatar Nov 11 '25 07:11 jli113

See this note:https://github.com/kvcache-ai/ktransformers/tree/main/kt-kernel#hwloc-not-found

Image

KMSorSMS avatar Nov 11 '25 07:11 KMSorSMS

sudo apt update sudo apt install pkg-config libhwloc-dev

jli113 avatar Nov 11 '25 07:11 jli113

(kt) k1@k0:~/ktransformers/kt-kernel$ python -c "from kt_kernel import KTMoEWrapper; print('✓ kt-kernel installed successfully')" Traceback (most recent call last): File "", line 1, in File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/init.py", line 27, in from .experts import KTMoEWrapper File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/experts.py", line 20, in from .utils.amx import AMXMoEWrapper ModuleNotFoundError: No module named 'kt_kernel.utils'

jli113 avatar Nov 11 '25 07:11 jli113

sglang uses pre 1588 wrapper AMXMoEWrapper use git checkout 8729435 and sglang==0.5.5

lavdnone2 avatar Nov 12 '25 05:11 lavdnone2

The latest sglang is supported. You can check it right now.

KMSorSMS avatar Nov 12 '25 06:11 KMSorSMS

The latest sglang is supported. You can check it right now.

just did, 0.5.5.post1 doesn't and after 8729435 is breaking maybe add v0.4.2 at head 8729435, so it works with current sglang 0.5.5

lavdnone2 avatar Nov 12 '25 06:11 lavdnone2

I didn't understand. I have checked the sglang: https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/moe/kt_ep_wrapper.py Image It uses the latest wrapper. So basically, you just need to pull the latest kt with sglang, then it works? Do you mean you want to use some specific version of KT with sglang?

KMSorSMS avatar Nov 12 '25 06:11 KMSorSMS

After successfully installed kt-kernel and sglang, got a problem when running. Pretty sure nvcc is in system path.

k1@k0:~/ktransformers$ python -m sglang.launch_server   --host 0.0.0.0   --port 60000   --model /home/k1/models/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL   --kt-cpuinfer 12   --kt-threadpool-count 2   --kt-num-gpu-experts 200   --attention-backend flashinfer   --trust-remote-code   --mem-fraction-static 0.98   --chunked-prefill-size 4096   --max-running-requests 37   --max-total-tokens 37000   --enable-mixed-chunk   --tensor-parallel-size 8   --enable-p2p-check   --disable-shared-experts-fusion
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/launch_server.py", line 24, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 275, in __init__
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 595, in __post_init__
    self._handle_model_specific_adjustments()
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 899, in _handle_model_specific_adjustments
    from sglang.srt.configs.model_config import is_deepseek_nsa
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/configs/model_config.py", line 26, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/__init__.py", line 19, in <module>
    from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in <module>
    from sglang.srt.layers.quantization.utils import get_scalar_types
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in <module>
    from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module>
    from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/__init__.py", line 9, in <module>
    _preload_cuda_library()
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/load_utils.py", line 220, in _preload_cuda_library
    raise RuntimeError("Could not find CUDA lib directory.")
RuntimeError: Could not find CUDA lib directory.

Reinstalled with.

# Example for LLAMAFILE backend on AMX CPU with AVX512
export CPUINFER_CPU_INSTRUCT=AVX2  # Options: NATIVE, AVX512, AVX2
export CPUINFER_ENABLE_AMX=OFF       # Options: ON, OFF
export CMAKE_ARGS="-D CMAKE_CUDA_COMPILER=$(which nvcc)"

./install.sh --manual

Checking and installing system dependencies...
Installing cmake via conda...
2 channel Terms of Service accepted
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Detected Debian-based system. Installing libhwloc-dev and pkg-config...
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local  InRelease [1,572 B]
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local  InRelease [1,572 B]
Hit:2 http://mirrors.aliyun.com/ubuntu jammy InRelease
Hit:3 http://mirrors.aliyun.com/ubuntu jammy-updates InRelease
Hit:4 http://mirrors.aliyun.com/ubuntu jammy-backports InRelease
Hit:5 https://mirrors.aliyun.com/docker-ce/linux/ubuntu jammy InRelease
Hit:6 https://deb.nodesource.com/node_23.x nodistro InRelease
Hit:7 https://apt.llvm.org/jammy llvm-toolchain-jammy-20 InRelease
Hit:8 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease
Hit:9 http://security.ubuntu.com/ubuntu jammy-security InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
8 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pkg-config is already the newest version (0.29.2-1ubuntu3).
libhwloc-dev is already the newest version (2.7.0-2ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Building kt-kernel with configuration:
  CPUINFER_CPU_INSTRUCT=AVX2
  CPUINFER_ENABLE_AMX=OFF
  CPUINFER_BUILD_TYPE=Release
  CPUINFER_PARALLEL=8
  CPUINFER_VERBOSE=1

Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
Processing /home/k1/ktransformers/kt-kernel
  Running command pip subprocess to install build dependencies
  Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
  Collecting setuptools>=61
    Obtaining dependency information for setuptools>=61 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
    Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
  Collecting wheel
    Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
    Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
  Collecting cmake>=3.16
    Obtaining dependency information for cmake>=3.16 from https://files.pythonhosted.org/packages/f3/56/0fc4d83f212cef10b7bbf6c5043e4582af80ad2aef6905e0dc33fbf68b11/cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata
    Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.5 kB)
  Collecting pybind11
    Obtaining dependency information for pybind11 from https://files.pythonhosted.org/packages/cd/8a/37362fc2b949d5f733a8b0f2ff51ba423914cabefe69f1d1b6aab710f5fe/pybind11-3.0.1-py3-none-any.whl.metadata
    Using cached pybind11-3.0.1-py3-none-any.whl.metadata (10.0 kB)
  Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
  Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
  Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.7 MB)
  Using cached pybind11-3.0.1-py3-none-any.whl (293 kB)
  Installing collected packages: wheel, setuptools, pybind11, cmake
    Creating /tmp/pip-build-env-y8t0fu_x/overlay/bin
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/wheel to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/pybind11-config to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ccmake to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cmake to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cpack to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ctest to 775

  Successfully installed cmake-4.1.2 pybind11-3.0.1 setuptools-80.9.0 wheel-0.45.1
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running egg_info
  creating kt_kernel.egg-info
  writing kt_kernel.egg-info/PKG-INFO
  writing dependency_links to kt_kernel.egg-info/dependency_links.txt
  writing requirements to kt_kernel.egg-info/requires.txt
  writing top-level names to kt_kernel.egg-info/top_level.txt
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  Getting requirements to build wheel ... done
  Running command Preparing metadata (pyproject.toml)
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running dist_info
  creating /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info
  writing /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  creating '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel-0.1.0.dist-info'
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.8.0)
Requirement already satisfied: safetensors>=0.4.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.6.2)
Requirement already satisfied: compressed-tensors>=0.7.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.12.2)
Requirement already satisfied: numpy>=1.24.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.3.4)
Requirement already satisfied: triton>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (3.4.0)
Requirement already satisfied: gguf>=0.17.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.17.1)
Requirement already satisfied: black>=25.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (25.11.0)
Requirement already satisfied: click>=8.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (8.3.0)
Requirement already satisfied: mypy-extensions>=0.4.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (1.1.0)
Requirement already satisfied: packaging>=22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (25.0)
Requirement already satisfied: pathspec>=0.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.12.1)
Requirement already satisfied: platformdirs>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (4.5.0)
Requirement already satisfied: pytokens>=0.3.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.3.0)
Requirement already satisfied: transformers in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.57.1)
Requirement already satisfied: pydantic>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.12.4)
Requirement already satisfied: loguru in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.3)
Requirement already satisfied: pyyaml>=5.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (6.0.3)
Requirement already satisfied: tqdm>=4.27 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (4.67.1)
Requirement already satisfied: annotated-types>=0.6.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.4.2)
Requirement already satisfied: filelock in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.20.0)
Requirement already satisfied: sympy>=1.13.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.5)
Requirement already satisfied: jinja2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.13.1.3)
Requirement already satisfied: setuptools>=40.8.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from triton>=2.0.0->kt-kernel==0.1.0) (80.9.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) (3.0.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.36.0)
Requirement already satisfied: regex!=2019.12.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.3)
Requirement already satisfied: requests in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.32.5)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.22.1)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (1.2.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.12)
Building wheels for collected packages: kt-kernel
  Running command Building wheel for kt-kernel (pyproject.toml)
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running bdist_wheel
  running build
  running build_py
  creating build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  creating build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/amx.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/llamafile.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/loader.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  running egg_info
  writing kt_kernel.egg-info/PKG-INFO
  writing dependency_links to kt_kernel.egg-info/dependency_links.txt
  writing requirements to kt_kernel.egg-info/requires.txt
  writing top-level names to kt_kernel.egg-info/top_level.txt
  reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  running build_ext
  -- The C compiler identification is GNU 11.4.0
  -- The CXX compiler identification is GNU 11.4.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /usr/bin/cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- No .git directory found; skipping git hooks installation
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- CMAKE_CXX_FLAGS:  -O3 -ffast-math
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  CMake Warning at CMakeLists.txt:252 (message):
    pure AVX is not supported at least avx2


  -- ARCH_FLAGS: -mf16c;-mfma;-mavx;-mfma;-msse3;-mf16c;-mavx2;-mfma;-msse3;-mf16c
  CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required):
    Compatibility with CMake < 3.10 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
    to tell CMake that the project requires at least <min> but has been updated
    to work with policies introduced by <max> or earlier.


  -- pybind11 v2.14.0 dev1
  -- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7")
  -- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so
  -- Performing Test HAS_FLTO
  -- Performing Test HAS_FLTO - Success
  -- Found Git: /usr/bin/git (found version "2.34.1")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  -- Found Threads: TRUE
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- OpenMP found
  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  -- CUDA detected
  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61")
  -- enabling CUDA
  -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- SOURCE_DIR7:
  CMake Warning at CMakeLists.txt:485 (message):
    clang-format not found.  Please install clang-format (>=18) or pass
    -DCLANG_FORMAT_BIN=/full/path and reconfigure.


  -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")
  -- Checking for one of the modules 'hwloc'
  -- LTO: disabled
  -- NUMA library found: /usr/lib/x86_64-linux-gnu/libnuma.so - enabling NUMA support
  -- Configuring done (17.1s)
  -- Generating done (0.0s)
  -- Build files have been written to: /home/k1/ktransformers/kt-kernel/build/temp.linux-x86_64-cpython-311/kt_kernel_ext_Release
  [  1%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/flags.cpp.o
  [  2%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
  [  3%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
  [  5%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
  [  7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o
  [  9%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o
  [  9%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
  [ 10%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o
  [ 11%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
  [ 13%] Built target build_info
  [ 14%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/sgemm.cpp.o
  [ 15%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
  [ 17%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
  [ 18%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
  [ 19%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
  [ 21%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
  [ 22%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
  [ 23%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
  [ 25%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
  [ 26%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
  [ 27%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
  [ 28%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
  [ 30%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
  [ 31%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
  [ 32%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
  [ 34%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
  [ 35%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
  [ 36%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
  [ 38%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
  [ 38%] Built target ggml
  [ 39%] Linking CXX static library libggml_static.a
  [ 40%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
  [ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o
  [ 43%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o
  [ 43%] Built target ggml_static
  [ 44%] Linking CXX static library libllamafile.a
  [ 44%] Built target llamafile
  [ 46%] Linking CXX static library libllama.a
  [ 46%] Built target llama
  [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/ext_bindings.cpp.o
  [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/shared_mem_buffer.cpp.o
  [ 51%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/task_queue.cpp.o
  [ 51%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
  [ 52%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/linear.cpp.o
  [ 53%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
  [ 55%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/worker_pool.cpp.o
  [ 56%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
  [ 57%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o
  [ 59%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/mlp.cpp.o
  [ 60%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/flags.cpp.o
  [ 61%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
  [ 63%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
  [ 64%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
  [ 65%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
  [ 67%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
  [ 68%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/sgemm.cpp.o
  [ 69%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
  [ 71%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
  [ 72%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
  [ 73%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
  [ 75%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
  [ 76%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
  [ 77%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
  [ 78%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
  [ 80%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
  [ 81%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
  [ 82%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
  [ 84%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
  [ 85%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
  [ 86%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
  [ 88%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
  [ 89%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
  [ 90%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
  [ 92%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
  [ 93%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_attn.cpp.o
  [ 94%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o
  [ 96%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_read_write.cpp.o
  [ 97%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_utils.cpp.o
  [ 98%] Linking CXX static library libcommon.a
  [ 98%] Built target common
  [100%] Linking CXX shared module /home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
  [100%] Built target kt_kernel_ext
  -- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
  Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'decodeassists', 'mba', 'fsgsbase', 'wdt', 'f16c', 'rep_good', 'mce', 'arat', 'rdt_a', 'tsc_scale', 'avic', 'wbnoinvd', 'flushbyasid', 'sev', 'mmx', 'apic', 'ibrs', 'vgif', 'fxsr', 'mmxext', 'ht', 'cmov', 'ibs', 'bpext', 'cpb', 'mwaitx', 'avx', 'smca', 'pausefilter', 'skinit', 'fpu', 'perfctr_core', 'ssse3', 'avx2', 'cat_l3', 'xsaveerptr', 'de', 'clflush', 'cqm_mbm_total', 'sep', 'rdseed', 'sse4_2', 'aes', 'sse', 'succor', 'smep', 'popcnt', 'topoext', 'xsaves', '3dnowprefetch', 'cx8', 'movbe', 'syscall', 'lahf_lm', 'stibp', 'cpuid', 'cx16', 'vme', 'umip', 'pdpe1gb', 'perfctr_nb', 'rdpru', 'smap', 'bmi1', 'tsc', 'cr8_legacy', 'lm', 'aperfmperf', 'pae', 'clzero', 'pfthreshold', 'vmcb_clean', 'svm', 'ssbd', 'ibpb_exit_to_user', 'overflow_recov', 'cqm_llc', 'ibpb', 'nx', 'adx', 'svm_lock', 'nrip_save', 'cmp_legacy', 'pat', 'clflushopt', 'constant_tsc', 'sse4a', 'sha_ni', 'v_vmsave_vmload', 'cqm', 'sse2', 'cdp_l3', 'pse36', 'rdrand', 'monitor', 'hw_pstate', 'irperf', 'cqm_mbm_local', 'perfctr_llc', 'osvw', 'rdtscp', 'abm', 'clwb', 'rapl', 'extd_apicid', 'xgetbv1', 'misalignsse', 'cqm_occup_llc', 'mca', 'xsave', 'v_spec_ctrl', 'npt', 'xsaveopt', 'mtrr', 'fma', 'rdpid', 'pclmulqdq', 'msr', 'pse', 'nonstop_tsc', 'nopl', 'fxsr_opt', 'extapic', 'lbrv', 'vmmcall', 'bmi2', 'pni', 'sse4_1', 'tce', 'sme', 'sev_es', 'pge', 'xsavec'}}}
  -- CPU detection: vendor=amd arch=x86_64 features=['AVX2']
  -- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
  -- CMake configure args:
      -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/
      -DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11
      -DCMAKE_BUILD_TYPE=Release
      -DLLAMA_NATIVE=OFF
      -DLLAMA_FMA=ON
      -DLLAMA_F16C=ON
      -DLLAMA_AVX=ON
      -DLLAMA_AVX2=ON
      -DKTRANSFORMERS_CPU_USE_AMX=OFF
      -DKTRANSFORMERS_USE_CUDA=ON
      -D
      CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  -- CMake build args: --build . --config Release --parallel 8
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts_base.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  creating build/bdist.linux-x86_64/wheel/kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/amx.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/llamafile.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/loader.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/.
  running install_egg_info
  Copying kt_kernel.egg-info to build/bdist.linux-x86_64/wheel/./kt_kernel-0.1.0-py3.11.egg-info
  running install_scripts
  creating build/bdist.linux-x86_64/wheel/kt_kernel-0.1.0.dist-info/WHEEL
  creating '/tmp/pip-wheel-ytzbvayt/.tmp-s9_0miv_/kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
  adding 'kt_kernel_ext.cpython-311-x86_64-linux-gnu.so'
  adding 'kt_kernel/__init__.py'
  adding 'kt_kernel/experts.py'
  adding 'kt_kernel/experts_base.py'
  adding 'kt_kernel/utils/__init__.py'
  adding 'kt_kernel/utils/amx.py'
  adding 'kt_kernel/utils/llamafile.py'
  adding 'kt_kernel/utils/loader.py'
  adding 'kt_kernel-0.1.0.dist-info/METADATA'
  adding 'kt_kernel-0.1.0.dist-info/WHEEL'
  adding 'kt_kernel-0.1.0.dist-info/top_level.txt'
  adding 'kt_kernel-0.1.0.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
  Building wheel for kt-kernel (pyproject.toml) ... done
  Created wheel for kt-kernel: filename=kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl size=1088779 sha256=12260abea7a2ba7b90c186715bc5512d23198bd1a1f2e0b8e4d799c85e39d323
  Stored in directory: /home/k1/.cache/pip/wheels/ac/0b/e5/74beab4a502dc518879a41bca5bc4af8470c8d1073a89aab1c
Successfully built kt-kernel
Installing collected packages: kt-kernel
  Attempting uninstall: kt-kernel
    Found existing installation: kt-kernel 0.1.0
    Uninstalling kt-kernel-0.1.0:
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel-0.1.0.dist-info/
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
      Successfully uninstalled kt-kernel-0.1.0
Successfully installed kt-kernel-0.1.0
Successfully built and installed kt-kernel! with configuration:
  CPUINFER_CPU_INSTRUCT=AVX2
  CPUINFER_ENABLE_AMX=OFF
  CPUINFER_BUILD_TYPE=Release

The problem remains.

jli113 avatar Nov 12 '25 08:11 jli113

Image This is weird because the kt-kernel building finds the CUDA toolkit. Can you run the official sglang without kt-backend? Seems like the problem is not associated with kt-kernel but your env problem? Or could you try the cuda with a lower version, like 12.6?

KMSorSMS avatar Nov 12 '25 09:11 KMSorSMS

I didn't understand. I have checked the sglang: https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/moe/kt_ep_wrapper.py Image It uses the latest wrapper. So basically, you just need to pull the latest kt with sglang, then it works? Do you mean you want to use some specific version of KT with sglang?

yes, saw that, if built, but current sglang pip versions 0.5.5.post1 - 0.5.5.post2 do not have that

lavdnone2 avatar Nov 12 '25 15:11 lavdnone2

So, we can only use the source code to download. Get it, we will update doc to point out this.

KMSorSMS avatar Nov 13 '25 07:11 KMSorSMS

After successfully installed kt-kernel and sglang, got a problem when running. Pretty sure nvcc is in system path.

k1@k0:~/ktransformers$ python -m sglang.launch_server   --host 0.0.0.0   --port 60000   --model /home/k1/models/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL   --kt-cpuinfer 12   --kt-threadpool-count 2   --kt-num-gpu-experts 200   --attention-backend flashinfer   --trust-remote-code   --mem-fraction-static 0.98   --chunked-prefill-size 4096   --max-running-requests 37   --max-total-tokens 37000   --enable-mixed-chunk   --tensor-parallel-size 8   --enable-p2p-check   --disable-shared-experts-fusion
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/launch_server.py", line 24, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 275, in __init__
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 595, in __post_init__
    self._handle_model_specific_adjustments()
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 899, in _handle_model_specific_adjustments
    from sglang.srt.configs.model_config import is_deepseek_nsa
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/configs/model_config.py", line 26, in <module>
    from sglang.srt.layers.quantization import QUANTIZATION_METHODS
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/__init__.py", line 19, in <module>
    from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in <module>
    from sglang.srt.layers.quantization.utils import get_scalar_types
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in <module>
    from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module>
    from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/__init__.py", line 9, in <module>
    _preload_cuda_library()
  File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/load_utils.py", line 220, in _preload_cuda_library
    raise RuntimeError("Could not find CUDA lib directory.")
RuntimeError: Could not find CUDA lib directory.

Reinstalled with.

# Example for LLAMAFILE backend on AMX CPU with AVX512
export CPUINFER_CPU_INSTRUCT=AVX2  # Options: NATIVE, AVX512, AVX2
export CPUINFER_ENABLE_AMX=OFF       # Options: ON, OFF
export CMAKE_ARGS="-D CMAKE_CUDA_COMPILER=$(which nvcc)"

./install.sh --manual

Checking and installing system dependencies...
Installing cmake via conda...
2 channel Terms of Service accepted
Channels:
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

Detected Debian-based system. Installing libhwloc-dev and pkg-config...
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local  InRelease [1,572 B]
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local  InRelease [1,572 B]
Hit:2 http://mirrors.aliyun.com/ubuntu jammy InRelease
Hit:3 http://mirrors.aliyun.com/ubuntu jammy-updates InRelease
Hit:4 http://mirrors.aliyun.com/ubuntu jammy-backports InRelease
Hit:5 https://mirrors.aliyun.com/docker-ce/linux/ubuntu jammy InRelease
Hit:6 https://deb.nodesource.com/node_23.x nodistro InRelease
Hit:7 https://apt.llvm.org/jammy llvm-toolchain-jammy-20 InRelease
Hit:8 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease
Hit:9 http://security.ubuntu.com/ubuntu jammy-security InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
8 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pkg-config is already the newest version (0.29.2-1ubuntu3).
libhwloc-dev is already the newest version (2.7.0-2ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Building kt-kernel with configuration:
  CPUINFER_CPU_INSTRUCT=AVX2
  CPUINFER_ENABLE_AMX=OFF
  CPUINFER_BUILD_TYPE=Release
  CPUINFER_PARALLEL=8
  CPUINFER_VERBOSE=1

Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
Processing /home/k1/ktransformers/kt-kernel
  Running command pip subprocess to install build dependencies
  Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
  Collecting setuptools>=61
    Obtaining dependency information for setuptools>=61 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
    Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
  Collecting wheel
    Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
    Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
  Collecting cmake>=3.16
    Obtaining dependency information for cmake>=3.16 from https://files.pythonhosted.org/packages/f3/56/0fc4d83f212cef10b7bbf6c5043e4582af80ad2aef6905e0dc33fbf68b11/cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata
    Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.5 kB)
  Collecting pybind11
    Obtaining dependency information for pybind11 from https://files.pythonhosted.org/packages/cd/8a/37362fc2b949d5f733a8b0f2ff51ba423914cabefe69f1d1b6aab710f5fe/pybind11-3.0.1-py3-none-any.whl.metadata
    Using cached pybind11-3.0.1-py3-none-any.whl.metadata (10.0 kB)
  Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
  Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
  Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.7 MB)
  Using cached pybind11-3.0.1-py3-none-any.whl (293 kB)
  Installing collected packages: wheel, setuptools, pybind11, cmake
    Creating /tmp/pip-build-env-y8t0fu_x/overlay/bin
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/wheel to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/pybind11-config to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ccmake to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cmake to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cpack to 775
    changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ctest to 775

  Successfully installed cmake-4.1.2 pybind11-3.0.1 setuptools-80.9.0 wheel-0.45.1
  Installing build dependencies ... done
  Running command Getting requirements to build wheel
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running egg_info
  creating kt_kernel.egg-info
  writing kt_kernel.egg-info/PKG-INFO
  writing dependency_links to kt_kernel.egg-info/dependency_links.txt
  writing requirements to kt_kernel.egg-info/requires.txt
  writing top-level names to kt_kernel.egg-info/top_level.txt
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  Getting requirements to build wheel ... done
  Running command Preparing metadata (pyproject.toml)
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running dist_info
  creating /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info
  writing /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  reading manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
  creating '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel-0.1.0.dist-info'
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.8.0)
Requirement already satisfied: safetensors>=0.4.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.6.2)
Requirement already satisfied: compressed-tensors>=0.7.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.12.2)
Requirement already satisfied: numpy>=1.24.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.3.4)
Requirement already satisfied: triton>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (3.4.0)
Requirement already satisfied: gguf>=0.17.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.17.1)
Requirement already satisfied: black>=25.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (25.11.0)
Requirement already satisfied: click>=8.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (8.3.0)
Requirement already satisfied: mypy-extensions>=0.4.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (1.1.0)
Requirement already satisfied: packaging>=22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (25.0)
Requirement already satisfied: pathspec>=0.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.12.1)
Requirement already satisfied: platformdirs>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (4.5.0)
Requirement already satisfied: pytokens>=0.3.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.3.0)
Requirement already satisfied: transformers in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.57.1)
Requirement already satisfied: pydantic>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.12.4)
Requirement already satisfied: loguru in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.3)
Requirement already satisfied: pyyaml>=5.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (6.0.3)
Requirement already satisfied: tqdm>=4.27 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (4.67.1)
Requirement already satisfied: annotated-types>=0.6.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.4.2)
Requirement already satisfied: filelock in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.20.0)
Requirement already satisfied: sympy>=1.13.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.5)
Requirement already satisfied: jinja2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.13.1.3)
Requirement already satisfied: setuptools>=40.8.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from triton>=2.0.0->kt-kernel==0.1.0) (80.9.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) (3.0.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.36.0)
Requirement already satisfied: regex!=2019.12.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.3)
Requirement already satisfied: requests in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.32.5)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.22.1)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (1.2.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.12)
Building wheels for collected packages: kt-kernel
  Running command Building wheel for kt-kernel (pyproject.toml)
  /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
    corresp(dist, value, root_dir)
  running bdist_wheel
  running build
  running build_py
  creating build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  copying python/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
  creating build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/amx.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/llamafile.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/loader.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  copying python/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
  running egg_info
  writing kt_kernel.egg-info/PKG-INFO
  writing dependency_links to kt_kernel.egg-info/dependency_links.txt
  writing requirements to kt_kernel.egg-info/requires.txt
  writing top-level names to kt_kernel.egg-info/top_level.txt
  reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
  writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
  running build_ext
  -- The C compiler identification is GNU 11.4.0
  -- The CXX compiler identification is GNU 11.4.0
  -- Detecting C compiler ABI info
  -- Detecting C compiler ABI info - done
  -- Check for working C compiler: /usr/bin/cc - skipped
  -- Detecting C compile features
  -- Detecting C compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- No .git directory found; skipping git hooks installation
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- Found OpenMP: TRUE (found version "4.5")
  -- CMAKE_CXX_FLAGS:  -O3 -ffast-math
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  CMake Warning at CMakeLists.txt:252 (message):
    pure AVX is not supported at least avx2


  -- ARCH_FLAGS: -mf16c;-mfma;-mavx;-mfma;-msse3;-mf16c;-mavx2;-mfma;-msse3;-mf16c
  CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required):
    Compatibility with CMake < 3.10 will be removed from a future version of
    CMake.

    Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
    to tell CMake that the project requires at least <min> but has been updated
    to work with policies introduced by <max> or earlier.


  -- pybind11 v2.14.0 dev1
  -- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7")
  -- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so
  -- Performing Test HAS_FLTO
  -- Performing Test HAS_FLTO - Success
  -- Found Git: /usr/bin/git (found version "2.34.1")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  -- Found Threads: TRUE
  -- Found OpenMP_C: -fopenmp (found version "4.5")
  -- Found OpenMP_CXX: -fopenmp (found version "4.5")
  -- OpenMP found
  -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
  -- CMAKE_SYSTEM_PROCESSOR: x86_64
  -- x86 detected
  -- CUDA detected
  -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61")
  -- enabling CUDA
  -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- SOURCE_DIR7:
  CMake Warning at CMakeLists.txt:485 (message):
    clang-format not found.  Please install clang-format (>=18) or pass
    -DCLANG_FORMAT_BIN=/full/path and reconfigure.


  -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")
  -- Checking for one of the modules 'hwloc'
  -- LTO: disabled
  -- NUMA library found: /usr/lib/x86_64-linux-gnu/libnuma.so - enabling NUMA support
  -- Configuring done (17.1s)
  -- Generating done (0.0s)
  -- Build files have been written to: /home/k1/ktransformers/kt-kernel/build/temp.linux-x86_64-cpython-311/kt_kernel_ext_Release
  [  1%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/flags.cpp.o
  [  2%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
  [  3%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
  [  5%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
  [  7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o
  [  9%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o
  [  9%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
  [ 10%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o
  [ 11%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
  [ 13%] Built target build_info
  [ 14%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/sgemm.cpp.o
  [ 15%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
  [ 17%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
  [ 18%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
  [ 19%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
  [ 21%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
  [ 22%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
  [ 23%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
  [ 25%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
  [ 26%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
  [ 27%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
  [ 28%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
  [ 30%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
  [ 31%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
  [ 32%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
  [ 34%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
  [ 35%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
  [ 36%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
  [ 38%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
  [ 38%] Built target ggml
  [ 39%] Linking CXX static library libggml_static.a
  [ 40%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
  [ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o
  [ 43%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o
  [ 43%] Built target ggml_static
  [ 44%] Linking CXX static library libllamafile.a
  [ 44%] Built target llamafile
  [ 46%] Linking CXX static library libllama.a
  [ 46%] Built target llama
  [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/ext_bindings.cpp.o
  [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/shared_mem_buffer.cpp.o
  [ 51%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/task_queue.cpp.o
  [ 51%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
  [ 52%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/linear.cpp.o
  [ 53%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
  [ 55%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/worker_pool.cpp.o
  [ 56%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
  [ 57%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o
  [ 59%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/mlp.cpp.o
  [ 60%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/flags.cpp.o
  [ 61%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
  [ 63%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
  [ 64%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
  [ 65%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
  [ 67%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
  [ 68%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/sgemm.cpp.o
  [ 69%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
  [ 71%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
  [ 72%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
  [ 73%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
  [ 75%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
  [ 76%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
  [ 77%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
  [ 78%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
  [ 80%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
  [ 81%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
  [ 82%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
  [ 84%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
  [ 85%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
  [ 86%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
  [ 88%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
  [ 89%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
  [ 90%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
  [ 92%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
  [ 93%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_attn.cpp.o
  [ 94%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o
  [ 96%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_read_write.cpp.o
  [ 97%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_utils.cpp.o
  [ 98%] Linking CXX static library libcommon.a
  [ 98%] Built target common
  [100%] Linking CXX shared module /home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
  [100%] Built target kt_kernel_ext
  -- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
  Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'decodeassists', 'mba', 'fsgsbase', 'wdt', 'f16c', 'rep_good', 'mce', 'arat', 'rdt_a', 'tsc_scale', 'avic', 'wbnoinvd', 'flushbyasid', 'sev', 'mmx', 'apic', 'ibrs', 'vgif', 'fxsr', 'mmxext', 'ht', 'cmov', 'ibs', 'bpext', 'cpb', 'mwaitx', 'avx', 'smca', 'pausefilter', 'skinit', 'fpu', 'perfctr_core', 'ssse3', 'avx2', 'cat_l3', 'xsaveerptr', 'de', 'clflush', 'cqm_mbm_total', 'sep', 'rdseed', 'sse4_2', 'aes', 'sse', 'succor', 'smep', 'popcnt', 'topoext', 'xsaves', '3dnowprefetch', 'cx8', 'movbe', 'syscall', 'lahf_lm', 'stibp', 'cpuid', 'cx16', 'vme', 'umip', 'pdpe1gb', 'perfctr_nb', 'rdpru', 'smap', 'bmi1', 'tsc', 'cr8_legacy', 'lm', 'aperfmperf', 'pae', 'clzero', 'pfthreshold', 'vmcb_clean', 'svm', 'ssbd', 'ibpb_exit_to_user', 'overflow_recov', 'cqm_llc', 'ibpb', 'nx', 'adx', 'svm_lock', 'nrip_save', 'cmp_legacy', 'pat', 'clflushopt', 'constant_tsc', 'sse4a', 'sha_ni', 'v_vmsave_vmload', 'cqm', 'sse2', 'cdp_l3', 'pse36', 'rdrand', 'monitor', 'hw_pstate', 'irperf', 'cqm_mbm_local', 'perfctr_llc', 'osvw', 'rdtscp', 'abm', 'clwb', 'rapl', 'extd_apicid', 'xgetbv1', 'misalignsse', 'cqm_occup_llc', 'mca', 'xsave', 'v_spec_ctrl', 'npt', 'xsaveopt', 'mtrr', 'fma', 'rdpid', 'pclmulqdq', 'msr', 'pse', 'nonstop_tsc', 'nopl', 'fxsr_opt', 'extapic', 'lbrv', 'vmmcall', 'bmi2', 'pni', 'sse4_1', 'tce', 'sme', 'sev_es', 'pge', 'xsavec'}}}
  -- CPU detection: vendor=amd arch=x86_64 features=['AVX2']
  -- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
  -- CMake configure args:
      -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/
      -DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11
      -DCMAKE_BUILD_TYPE=Release
      -DLLAMA_NATIVE=OFF
      -DLLAMA_FMA=ON
      -DLLAMA_F16C=ON
      -DLLAMA_AVX=ON
      -DLLAMA_AVX2=ON
      -DKTRANSFORMERS_CPU_USE_AMX=OFF
      -DKTRANSFORMERS_USE_CUDA=ON
      -D
      CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
  -- CMake build args: --build . --config Release --parallel 8
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts_base.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  creating build/bdist.linux-x86_64/wheel/kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/amx.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/llamafile.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/loader.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
  copying build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/.
  running install_egg_info
  Copying kt_kernel.egg-info to build/bdist.linux-x86_64/wheel/./kt_kernel-0.1.0-py3.11.egg-info
  running install_scripts
  creating build/bdist.linux-x86_64/wheel/kt_kernel-0.1.0.dist-info/WHEEL
  creating '/tmp/pip-wheel-ytzbvayt/.tmp-s9_0miv_/kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
  adding 'kt_kernel_ext.cpython-311-x86_64-linux-gnu.so'
  adding 'kt_kernel/__init__.py'
  adding 'kt_kernel/experts.py'
  adding 'kt_kernel/experts_base.py'
  adding 'kt_kernel/utils/__init__.py'
  adding 'kt_kernel/utils/amx.py'
  adding 'kt_kernel/utils/llamafile.py'
  adding 'kt_kernel/utils/loader.py'
  adding 'kt_kernel-0.1.0.dist-info/METADATA'
  adding 'kt_kernel-0.1.0.dist-info/WHEEL'
  adding 'kt_kernel-0.1.0.dist-info/top_level.txt'
  adding 'kt_kernel-0.1.0.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
  Building wheel for kt-kernel (pyproject.toml) ... done
  Created wheel for kt-kernel: filename=kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl size=1088779 sha256=12260abea7a2ba7b90c186715bc5512d23198bd1a1f2e0b8e4d799c85e39d323
  Stored in directory: /home/k1/.cache/pip/wheels/ac/0b/e5/74beab4a502dc518879a41bca5bc4af8470c8d1073a89aab1c
Successfully built kt-kernel
Installing collected packages: kt-kernel
  Attempting uninstall: kt-kernel
    Found existing installation: kt-kernel 0.1.0
    Uninstalling kt-kernel-0.1.0:
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel-0.1.0.dist-info/
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/
      Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
      Successfully uninstalled kt-kernel-0.1.0
Successfully installed kt-kernel-0.1.0
Successfully built and installed kt-kernel! with configuration:
  CPUINFER_CPU_INSTRUCT=AVX2
  CPUINFER_ENABLE_AMX=OFF
  CPUINFER_BUILD_TYPE=Release

The problem remains.

A less elegant temporary solution for _preload_cuda_library, CUDA_HOME=/usr/local/cuda-12.8 python -m sglang.launch_server ...

slin000111 avatar Nov 13 '25 14:11 slin000111

In this PR, I also fix this by scanning the CUDA toolkit:#1600 (see the change of setup.py)

KMSorSMS avatar Nov 14 '25 02:11 KMSorSMS