[Bug] error with kt-kernel installation
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- [x] 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/kvcache-ai/ktransformers/discussions. Otherwise, it will be closed.
- [x] 5. To help the community, I will use Chinese/English or attach an Chinese/English translation if using another language. Non-Chinese/English content without translation may be closed.
Describe the bug
Processing /home/k1/ktransformers/kt-kernel Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting torch>=2.0.0 (from kt-kernel==0.1.0) Using cached torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB) Collecting safetensors>=0.4.0 (from kt-kernel==0.1.0) Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB) Collecting compressed-tensors>=0.7.0 (from kt-kernel==0.1.0) Using cached compressed_tensors-0.12.2-py3-none-any.whl.metadata (7.0 kB) Collecting numpy>=1.24.0 (from kt-kernel==0.1.0) Using cached numpy-2.3.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB) Collecting triton>=2.0.0 (from kt-kernel==0.1.0) Using cached triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.7 kB) Collecting black>=25.9.0 (from kt-kernel==0.1.0) Using cached black-25.11.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (85 kB) Collecting click>=8.0.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached click-8.3.0-py3-none-any.whl.metadata (2.6 kB) Collecting mypy-extensions>=0.4.3 (from black>=25.9.0->kt-kernel==0.1.0) Using cached mypy_extensions-1.1.0-py3-none-any.whl.metadata (1.1 kB) Collecting packaging>=22.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached packaging-25.0-py3-none-any.whl.metadata (3.3 kB) Collecting pathspec>=0.9.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached pathspec-0.12.1-py3-none-any.whl.metadata (21 kB) Collecting platformdirs>=2 (from black>=25.9.0->kt-kernel==0.1.0) Using cached platformdirs-4.5.0-py3-none-any.whl.metadata (12 kB) Collecting pytokens>=0.3.0 (from black>=25.9.0->kt-kernel==0.1.0) Using cached pytokens-0.3.0-py3-none-any.whl.metadata (2.0 kB) Collecting transformers (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached transformers-4.57.1-py3-none-any.whl.metadata (43 kB) Collecting pydantic>=2.0 (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pydantic-2.12.4-py3-none-any.whl.metadata (89 kB) Collecting loguru (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached loguru-0.7.3-py3-none-any.whl.metadata (22 kB) Collecting annotated-types>=0.6.0 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB) Collecting pydantic-core==2.41.5 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB) Collecting typing-extensions>=4.14.1 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached typing_extensions-4.15.0-py3-none-any.whl.metadata (3.3 kB) Collecting typing-inspection>=0.4.2 (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached typing_inspection-0.4.2-py3-none-any.whl.metadata (2.6 kB) Collecting filelock (from torch>=2.0.0->kt-kernel==0.1.0) Using cached filelock-3.20.0-py3-none-any.whl.metadata (2.1 kB) Collecting sympy>=1.13.3 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB) Collecting networkx>=2.5.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached networkx-3.5-py3-none-any.whl.metadata (6.3 kB) Collecting jinja2 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB) Collecting fsspec>=0.8.5 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached fsspec-2025.10.0-py3-none-any.whl.metadata (10 kB) Collecting nvidia-cuda-nvrtc-cu12==12.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cuda-runtime-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cuda-cupti-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cudnn-cu12==9.10.2.21 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cublas-cu12==12.8.4.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cufft-cu12==11.3.3.83 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting nvidia-curand-cu12==10.3.9.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cusolver-cu12==11.7.3.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cusparse-cu12==12.5.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) Collecting nvidia-cusparselt-cu12==0.7.1 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl.metadata (7.0 kB) Collecting nvidia-nccl-cu12==2.27.5 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) Collecting nvidia-nvshmem-cu12==3.3.20 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.1 kB) Collecting nvidia-nvtx-cu12==12.8.90 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.8 kB) Collecting nvidia-nvjitlink-cu12==12.8.93 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.7 kB) Collecting nvidia-cufile-cu12==1.13.1.3 (from torch>=2.0.0->kt-kernel==0.1.0) Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.7 kB) Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) Using cached mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB) Collecting MarkupSafe>=2.0 (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) Using cached markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.7 kB) Collecting huggingface-hub<1.0,>=0.34.0 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached huggingface_hub-0.36.0-py3-none-any.whl.metadata (14 kB) Collecting pyyaml>=5.1 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (2.4 kB) Collecting regex!=2019.12.17 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached regex-2025.11.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (40 kB) Collecting requests (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached requests-2.32.5-py3-none-any.whl.metadata (4.9 kB) Collecting tokenizers<=0.23.0,>=0.22.0 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB) Collecting tqdm>=4.27 (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB) Collecting hf-xet<2.0.0,>=1.1.3 (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB) Collecting charset_normalizer<4,>=2 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (37 kB) Collecting idna<4,>=2.5 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached idna-3.11-py3-none-any.whl.metadata (8.4 kB) Collecting urllib3<3,>=1.21.1 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached urllib3-2.5.0-py3-none-any.whl.metadata (6.5 kB) Collecting certifi>=2017.4.17 (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) Using cached certifi-2025.10.5-py3-none-any.whl.metadata (2.5 kB) Using cached black-25.11.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.6 MB) Using cached click-8.3.0-py3-none-any.whl (107 kB) Using cached compressed_tensors-0.12.2-py3-none-any.whl (183 kB) Using cached mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB) Using cached numpy-2.3.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.9 MB) Using cached packaging-25.0-py3-none-any.whl (66 kB) Using cached pathspec-0.12.1-py3-none-any.whl (31 kB) Using cached platformdirs-4.5.0-py3-none-any.whl (18 kB) Using cached pydantic-2.12.4-py3-none-any.whl (463 kB) Using cached pydantic_core-2.41.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB) Using cached pytokens-0.3.0-py3-none-any.whl (12 kB) Using cached safetensors-0.6.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (485 kB) Using cached torch-2.9.0-cp311-cp311-manylinux_2_28_x86_64.whl (899.8 MB) Using cached nvidia_cublas_cu12-12.8.4.1-py3-none-manylinux_2_27_x86_64.whl (594.3 MB) Using cached nvidia_cuda_cupti_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (10.2 MB) Using cached nvidia_cuda_nvrtc_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (88.0 MB) Using cached nvidia_cuda_runtime_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (954 kB) Using cached nvidia_cudnn_cu12-9.10.2.21-py3-none-manylinux_2_27_x86_64.whl (706.8 MB) Using cached nvidia_cufft_cu12-11.3.3.83-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (193.1 MB) Using cached nvidia_cufile_cu12-1.13.1.3-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.2 MB) Using cached nvidia_curand_cu12-10.3.9.90-py3-none-manylinux_2_27_x86_64.whl (63.6 MB) Using cached nvidia_cusolver_cu12-11.7.3.90-py3-none-manylinux_2_27_x86_64.whl (267.5 MB) Using cached nvidia_cusparse_cu12-12.5.8.93-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (288.2 MB) Using cached nvidia_cusparselt_cu12-0.7.1-py3-none-manylinux2014_x86_64.whl (287.2 MB) Using cached nvidia_nccl_cu12-2.27.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (322.3 MB) Using cached nvidia_nvjitlink_cu12-12.8.93-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (39.3 MB) Using cached nvidia_nvshmem_cu12-3.3.20-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (124.7 MB) Using cached nvidia_nvtx_cu12-12.8.90-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) Using cached triton-3.5.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (170.4 MB) Using cached fsspec-2025.10.0-py3-none-any.whl (200 kB) Using cached networkx-3.5-py3-none-any.whl (2.0 MB) Using cached sympy-1.14.0-py3-none-any.whl (6.3 MB) Using cached mpmath-1.3.0-py3-none-any.whl (536 kB) Using cached typing_extensions-4.15.0-py3-none-any.whl (44 kB) Using cached typing_inspection-0.4.2-py3-none-any.whl (14 kB) Using cached filelock-3.20.0-py3-none-any.whl (16 kB) Using cached jinja2-3.1.6-py3-none-any.whl (134 kB) Using cached markupsafe-3.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB) Using cached loguru-0.7.3-py3-none-any.whl (61 kB) Using cached transformers-4.57.1-py3-none-any.whl (12.0 MB) Using cached huggingface_hub-0.36.0-py3-none-any.whl (566 kB) Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB) Using cached tokenizers-0.22.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB) Using cached pyyaml-6.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (806 kB) Using cached regex-2025.11.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (800 kB) Using cached tqdm-4.67.1-py3-none-any.whl (78 kB) Using cached requests-2.32.5-py3-none-any.whl (64 kB) Using cached charset_normalizer-3.4.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (151 kB) Using cached idna-3.11-py3-none-any.whl (71 kB) Using cached urllib3-2.5.0-py3-none-any.whl (129 kB) Using cached certifi-2025.10.5-py3-none-any.whl (163 kB) Building wheels for collected packages: kt-kernel Building wheel for kt-kernel (pyproject.toml) ... error error: subprocess-exited-with-error
× Building wheel for kt-kernel (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [162 lines of output]
/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: license overwritten by pyproject.toml
corresp(dist, value, root_dir)
running bdist_wheel
running build
running build_py
creating build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/init.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
running egg_info
writing kt_kernel.egg-info/PKG-INFO
writing dependency_links to kt_kernel.egg-info/dependency_links.txt
writing requirements to kt_kernel.egg-info/requires.txt
writing top-level names to kt_kernel.egg-info/top_level.txt
reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- No .git directory found; skipping git hooks installation
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- CMAKE_CXX_FLAGS: -O3 -ffast-math
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Found BLIS include at /usr/include/x86_64-linux-gnu
-- Found BLIS library /usr/lib/x86_64-linux-gnu/libblis.so
-- ARCH_FLAGS: -mfma;-mavx;-mavx2;-march=native
CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required):
Compatibility with CMake < 3.10 will be removed from a future version of
CMake.
Update the VERSION argument <min> value. Or, use the <min>...<max> syntax
to tell CMake that the project requires at least <min> but has been updated
to work with policies introduced by <max> or earlier.
-- pybind11 v2.14.0 dev1
-- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7")
-- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- OpenMP found
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- CUDA detected
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61")
-- enabling CUDA
-- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- SOURCE_DIR7:
CMake Warning at CMakeLists.txt:485 (message):
clang-format not found. Please install clang-format (>=18) or pass
-DCLANG_FORMAT_BIN=/full/path and reconfigure.
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
CMake Error at CMakeLists.txt:531 (message):
FindHWLOC needs pkg-config program and PKG_CONFIG_PATH must contain the
path to hwloc.pc file.
-- Configuring incomplete, errors occurred!
-- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'rdtscp', 'nonstop_tsc', 'bmi2', '3dnowprefetch', 'rdseed', 'wdt', 'pclmulqdq', 'rdpru', 'sha_ni', 'succor', 'cqm_mbm_total', 'xsavec', 'fpu', 'avx', 'perfctr_core', 'aperfmperf', 'avx2', 'xgetbv1', 'rep_good', 'lahf_lm', 'cmp_legacy', 'ssbd', 'v_vmsave_vmload', 'clzero', 'cmov', 'mca', 'monitor', 'mba', 'bpext', 'nopl', 'stibp', 'nrip_save', 'vmmcall', 'sme', 'cx8', 'sep', 'misalignsse', 'topoext', 'clwb', 'clflush', 'cat_l3', 'adx', 'pge', 'mwaitx', 'ibrs', 'npt', 'xsaves', 'cpuid', 'sse4_1', 'lm', 'pni', 'aes', 'perfctr_nb', 'smep', 'lbrv', 'pae', 'sev_es', 'apic', 'svm', 'ibpb', 'syscall', 'mmxext', 'constant_tsc', 'cqm', 'smca', 'msr', 'fxsr', 'tsc', 'pat', 'abm', 'umip', 'vgif', 'fxsr_opt', 'overflow_recov', 'vme', 'avic', 'extd_apicid', 'decodeassists', 'cqm_mbm_local', 'rapl', 'mce', 'pfthreshold', 'tsc_scale', 'pse', 'tce', 'rdrand', 'xsaveerptr', 'sev', 'extapic', 'perfctr_llc', 'smap', 'cqm_occup_llc', 'fma', 'sse', 'popcnt', 'ht', 'cx16', 'ibs', 'flushbyasid', 'wbnoinvd', 'xsaveopt', 'hw_pstate', 'bmi1', 'movbe', 'rdpid', 'svm_lock', 'pausefilter', 'sse4a', 'vmcb_clean', 'osvw', 'v_spec_ctrl', 'arat', 'ibpb_exit_to_user', 'rdt_a', 'mmx', 'cqm_llc', 'mtrr', 'sse4_2', 'nx', 'cpb', 'ssse3', 'cr8_legacy', 'cdp_l3', 'f16c', 'clflushopt', 'skinit', 'xsave', 'irperf', 'sse2', 'fsgsbase', 'pdpe1gb', 'de', 'pse36'}}}
-- Detected AMD CPU; enabling AMD MoE kernel (-DKTRANSFORMERS_CPU_MOE_AMD=ON)
-- CPU detection: vendor=amd arch=x86_64 features=['AVX2']
-- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
-- CMake configure args:
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/
-DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11
-DCMAKE_BUILD_TYPE=Release
-DLLAMA_NATIVE=ON
-DKTRANSFORMERS_CPU_MOE_AMD=ON
-DKTRANSFORMERS_USE_CUDA=ON
Traceback (most recent call last):
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 389, in <module>
main()
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 373, in main
json_out["return_val"] = hook(**hook_input["kwargs"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 280, in build_wheel
return _build_backend().build_wheel(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 435, in build_wheel
return _build(['bdist_wheel', '--dist-info-dir', str(metadata_directory)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 423, in _build
return self._build_with_temp_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 404, in _build_with_temp_dir
self.run_setup()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
exec(code, locals())
File "<string>", line 330, in <module>
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/__init__.py", line 115, in setup
return distutils.core.setup(**attrs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 186, in setup
return run_commands(dist)
^^^^^^^^^^^^^^^^^^
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 202, in run_commands
dist.run_commands()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
self.run_command(cmd)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/command/bdist_wheel.py", line 370, in run
self.run_command("build")
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 357, in run_command
self.distribution.run_command(command)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/dist.py", line 1102, in run_command
super().run_command(command)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 1021, in run_command
cmd_obj.run()
File "<string>", line 106, in run
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 96, in run
_build_ext.run(self)
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 368, in run
self.build_extensions()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 484, in build_extensions
self._build_extensions_serial()
File "/tmp/pip-build-env-p1ahpgx8/overlay/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 510, in _build_extensions_serial
self.build_extension(ext)
File "<string>", line 298, in build_extension
File "/home/k1/miniconda3/envs/kt/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cmake', '/home/k1/ktransformers/kt-kernel', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/', '-DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DLLAMA_NATIVE=ON', '-DKTRANSFORMERS_CPU_MOE_AMD=ON', '-DKTRANSFORMERS_USE_CUDA=ON']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for kt-kernel Failed to build kt-kernel error: failed-wheel-build-for-install
× Failed to build installable wheels for some pyproject.toml based projects ╰─> kt-kernel
Reproduction
repo installation:
conda create -n ktransformers python=3.11
git clone https://github.com/kvcache-ai/ktransformers.git
cd ktransformers
git submodule update --init --recursive
cd kt-kernel
pip install .
clang installation:
wget https://apt.llvm.org/llvm.sh
chmod u+x llvm.sh
sudo ./llvm.sh 18
Environment
Ubuntu 22.04.5 LTS Eight RTX 4000 ADA Single AMD EPYC 7402P
See this note:https://github.com/kvcache-ai/ktransformers/tree/main/kt-kernel#hwloc-not-found
sudo apt update sudo apt install pkg-config libhwloc-dev
(kt) k1@k0:~/ktransformers/kt-kernel$ python -c "from kt_kernel import KTMoEWrapper; print('✓ kt-kernel installed successfully')"
Traceback (most recent call last):
File "
sglang uses pre 1588 wrapper AMXMoEWrapper use git checkout 8729435 and sglang==0.5.5
The latest sglang is supported. You can check it right now.
The latest sglang is supported. You can check it right now.
just did, 0.5.5.post1 doesn't and after 8729435 is breaking maybe add v0.4.2 at head 8729435, so it works with current sglang 0.5.5
I didn't understand. I have checked the sglang:
https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/moe/kt_ep_wrapper.py
It uses the latest wrapper. So basically, you just need to pull the latest kt with sglang, then it works?
Do you mean you want to use some specific version of KT with sglang?
After successfully installed kt-kernel and sglang, got a problem when running. Pretty sure nvcc is in system path.
k1@k0:~/ktransformers$ python -m sglang.launch_server --host 0.0.0.0 --port 60000 --model /home/k1/models/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL --kt-cpuinfer 12 --kt-threadpool-count 2 --kt-num-gpu-experts 200 --attention-backend flashinfer --trust-remote-code --mem-fraction-static 0.98 --chunked-prefill-size 4096 --max-running-requests 37 --max-total-tokens 37000 --enable-mixed-chunk --tensor-parallel-size 8 --enable-p2p-check --disable-shared-experts-fusion
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/launch_server.py", line 24, in <module>
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 275, in __init__
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 595, in __post_init__
self._handle_model_specific_adjustments()
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 899, in _handle_model_specific_adjustments
from sglang.srt.configs.model_config import is_deepseek_nsa
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/configs/model_config.py", line 26, in <module>
from sglang.srt.layers.quantization import QUANTIZATION_METHODS
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/__init__.py", line 19, in <module>
from sglang.srt.layers.quantization.auto_round import AutoRoundConfig
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in <module>
from sglang.srt.layers.quantization.utils import get_scalar_types
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in <module>
from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module>
from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/__init__.py", line 9, in <module>
_preload_cuda_library()
File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/load_utils.py", line 220, in _preload_cuda_library
raise RuntimeError("Could not find CUDA lib directory.")
RuntimeError: Could not find CUDA lib directory.
Reinstalled with.
# Example for LLAMAFILE backend on AMX CPU with AVX512
export CPUINFER_CPU_INSTRUCT=AVX2 # Options: NATIVE, AVX512, AVX2
export CPUINFER_ENABLE_AMX=OFF # Options: ON, OFF
export CMAKE_ARGS="-D CMAKE_CUDA_COMPILER=$(which nvcc)"
./install.sh --manual
Checking and installing system dependencies...
Installing cmake via conda...
2 channel Terms of Service accepted
Channels:
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
# All requested packages already installed.
Detected Debian-based system. Installing libhwloc-dev and pkg-config...
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local InRelease [1,572 B]
Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local InRelease [1,572 B]
Hit:2 http://mirrors.aliyun.com/ubuntu jammy InRelease
Hit:3 http://mirrors.aliyun.com/ubuntu jammy-updates InRelease
Hit:4 http://mirrors.aliyun.com/ubuntu jammy-backports InRelease
Hit:5 https://mirrors.aliyun.com/docker-ce/linux/ubuntu jammy InRelease
Hit:6 https://deb.nodesource.com/node_23.x nodistro InRelease
Hit:7 https://apt.llvm.org/jammy llvm-toolchain-jammy-20 InRelease
Hit:8 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease
Hit:9 http://security.ubuntu.com/ubuntu jammy-security InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
8 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
pkg-config is already the newest version (0.29.2-1ubuntu3).
libhwloc-dev is already the newest version (2.7.0-2ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Building kt-kernel with configuration:
CPUINFER_CPU_INSTRUCT=AVX2
CPUINFER_ENABLE_AMX=OFF
CPUINFER_BUILD_TYPE=Release
CPUINFER_PARALLEL=8
CPUINFER_VERBOSE=1
Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
Processing /home/k1/ktransformers/kt-kernel
Running command pip subprocess to install build dependencies
Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11)
Collecting setuptools>=61
Obtaining dependency information for setuptools>=61 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
Collecting wheel
Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata
Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Collecting cmake>=3.16
Obtaining dependency information for cmake>=3.16 from https://files.pythonhosted.org/packages/f3/56/0fc4d83f212cef10b7bbf6c5043e4582af80ad2aef6905e0dc33fbf68b11/cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata
Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.5 kB)
Collecting pybind11
Obtaining dependency information for pybind11 from https://files.pythonhosted.org/packages/cd/8a/37362fc2b949d5f733a8b0f2ff51ba423914cabefe69f1d1b6aab710f5fe/pybind11-3.0.1-py3-none-any.whl.metadata
Using cached pybind11-3.0.1-py3-none-any.whl.metadata (10.0 kB)
Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB)
Using cached wheel-0.45.1-py3-none-any.whl (72 kB)
Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.7 MB)
Using cached pybind11-3.0.1-py3-none-any.whl (293 kB)
Installing collected packages: wheel, setuptools, pybind11, cmake
Creating /tmp/pip-build-env-y8t0fu_x/overlay/bin
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/wheel to 775
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/pybind11-config to 775
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ccmake to 775
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cmake to 775
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cpack to 775
changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ctest to 775
Successfully installed cmake-4.1.2 pybind11-3.0.1 setuptools-80.9.0 wheel-0.45.1
Installing build dependencies ... done
Running command Getting requirements to build wheel
/tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
corresp(dist, value, root_dir)
running egg_info
creating kt_kernel.egg-info
writing kt_kernel.egg-info/PKG-INFO
writing dependency_links to kt_kernel.egg-info/dependency_links.txt
writing requirements to kt_kernel.egg-info/requires.txt
writing top-level names to kt_kernel.egg-info/top_level.txt
writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
Getting requirements to build wheel ... done
Running command Preparing metadata (pyproject.toml)
/tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
corresp(dist, value, root_dir)
running dist_info
creating /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info
writing /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel-0.1.0.dist-info'
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.8.0)
Requirement already satisfied: safetensors>=0.4.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.6.2)
Requirement already satisfied: compressed-tensors>=0.7.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.12.2)
Requirement already satisfied: numpy>=1.24.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.3.4)
Requirement already satisfied: triton>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (3.4.0)
Requirement already satisfied: gguf>=0.17.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.17.1)
Requirement already satisfied: black>=25.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (25.11.0)
Requirement already satisfied: click>=8.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (8.3.0)
Requirement already satisfied: mypy-extensions>=0.4.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (1.1.0)
Requirement already satisfied: packaging>=22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (25.0)
Requirement already satisfied: pathspec>=0.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.12.1)
Requirement already satisfied: platformdirs>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (4.5.0)
Requirement already satisfied: pytokens>=0.3.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.3.0)
Requirement already satisfied: transformers in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.57.1)
Requirement already satisfied: pydantic>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.12.4)
Requirement already satisfied: loguru in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.3)
Requirement already satisfied: pyyaml>=5.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (6.0.3)
Requirement already satisfied: tqdm>=4.27 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (4.67.1)
Requirement already satisfied: annotated-types>=0.6.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.4.2)
Requirement already satisfied: filelock in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.20.0)
Requirement already satisfied: sympy>=1.13.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.14.0)
Requirement already satisfied: networkx in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.5)
Requirement already satisfied: jinja2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.1.6)
Requirement already satisfied: fsspec in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2025.10.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (9.10.2.21)
Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.4.1)
Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.3.3.83)
Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (10.3.9.90)
Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.7.3.90)
Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.5.8.93)
Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (0.7.1)
Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2.27.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93)
Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.13.1.3)
Requirement already satisfied: setuptools>=40.8.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from triton>=2.0.0->kt-kernel==0.1.0) (80.9.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) (3.0.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.36.0)
Requirement already satisfied: regex!=2019.12.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.3)
Requirement already satisfied: requests in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.32.5)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.22.1)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (1.2.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.4.4)
Requirement already satisfied: idna<4,>=2.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.12)
Building wheels for collected packages: kt-kernel
Running command Building wheel for kt-kernel (pyproject.toml)
/tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml`
corresp(dist, value, root_dir)
running bdist_wheel
running build
running build_py
creating build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
copying python/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel
creating build/lib.linux-x86_64-cpython-311/kt_kernel/utils
copying python/utils/amx.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
copying python/utils/llamafile.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
copying python/utils/loader.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
copying python/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils
running egg_info
writing kt_kernel.egg-info/PKG-INFO
writing dependency_links to kt_kernel.egg-info/dependency_links.txt
writing requirements to kt_kernel.egg-info/requires.txt
writing top-level names to kt_kernel.egg-info/top_level.txt
reading manifest file 'kt_kernel.egg-info/SOURCES.txt'
writing manifest file 'kt_kernel.egg-info/SOURCES.txt'
running build_ext
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- No .git directory found; skipping git hooks installation
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- CMAKE_CXX_FLAGS: -O3 -ffast-math
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Warning at CMakeLists.txt:252 (message):
pure AVX is not supported at least avx2
-- ARCH_FLAGS: -mf16c;-mfma;-mavx;-mfma;-msse3;-mf16c;-mavx2;-mfma;-msse3;-mf16c
CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required):
Compatibility with CMake < 3.10 will be removed from a future version of
CMake.
Update the VERSION argument <min> value. Or, use the <min>...<max> syntax
to tell CMake that the project requires at least <min> but has been updated
to work with policies introduced by <max> or earlier.
-- pybind11 v2.14.0 dev1
-- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7")
-- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Found Git: /usr/bin/git (found version "2.34.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- OpenMP found
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- CUDA detected
-- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61")
-- enabling CUDA
-- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- SOURCE_DIR7:
CMake Warning at CMakeLists.txt:485 (message):
clang-format not found. Please install clang-format (>=18) or pass
-DCLANG_FORMAT_BIN=/full/path and reconfigure.
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")
-- Checking for one of the modules 'hwloc'
-- LTO: disabled
-- NUMA library found: /usr/lib/x86_64-linux-gnu/libnuma.so - enabling NUMA support
-- Configuring done (17.1s)
-- Generating done (0.0s)
-- Build files have been written to: /home/k1/ktransformers/kt-kernel/build/temp.linux-x86_64-cpython-311/kt_kernel_ext_Release
[ 1%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/flags.cpp.o
[ 2%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 3%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o
[ 5%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
[ 7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 9%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o
[ 9%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
[ 10%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o
[ 11%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
[ 13%] Built target build_info
[ 14%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/sgemm.cpp.o
[ 15%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
[ 17%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
[ 18%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o
[ 19%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
[ 21%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
[ 22%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
[ 23%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
[ 25%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
[ 26%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
[ 27%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
[ 28%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
[ 30%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
[ 31%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
[ 32%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
[ 34%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
[ 35%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
[ 36%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
[ 38%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
[ 38%] Built target ggml
[ 39%] Linking CXX static library libggml_static.a
[ 40%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o
[ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o
[ 43%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 43%] Built target ggml_static
[ 44%] Linking CXX static library libllamafile.a
[ 44%] Built target llamafile
[ 46%] Linking CXX static library libllama.a
[ 46%] Built target llama
[ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/ext_bindings.cpp.o
[ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/shared_mem_buffer.cpp.o
[ 51%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/task_queue.cpp.o
[ 51%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o
[ 52%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/linear.cpp.o
[ 53%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o
[ 55%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/worker_pool.cpp.o
[ 56%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o
[ 57%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 59%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/mlp.cpp.o
[ 60%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/flags.cpp.o
[ 61%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o
[ 63%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 64%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o
[ 65%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o
[ 67%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o
[ 68%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/sgemm.cpp.o
[ 69%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o
[ 71%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o
[ 72%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 73%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o
[ 75%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o
[ 76%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o
[ 77%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o
[ 78%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o
[ 80%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o
[ 81%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o
[ 82%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o
[ 84%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o
[ 85%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o
[ 86%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o
[ 88%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o
[ 89%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o
[ 90%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o
[ 92%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o
[ 93%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_attn.cpp.o
[ 94%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o
[ 96%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_read_write.cpp.o
[ 97%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_utils.cpp.o
[ 98%] Linking CXX static library libcommon.a
[ 98%] Built target common
[100%] Linking CXX shared module /home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
[100%] Built target kt_kernel_ext
-- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES
Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'decodeassists', 'mba', 'fsgsbase', 'wdt', 'f16c', 'rep_good', 'mce', 'arat', 'rdt_a', 'tsc_scale', 'avic', 'wbnoinvd', 'flushbyasid', 'sev', 'mmx', 'apic', 'ibrs', 'vgif', 'fxsr', 'mmxext', 'ht', 'cmov', 'ibs', 'bpext', 'cpb', 'mwaitx', 'avx', 'smca', 'pausefilter', 'skinit', 'fpu', 'perfctr_core', 'ssse3', 'avx2', 'cat_l3', 'xsaveerptr', 'de', 'clflush', 'cqm_mbm_total', 'sep', 'rdseed', 'sse4_2', 'aes', 'sse', 'succor', 'smep', 'popcnt', 'topoext', 'xsaves', '3dnowprefetch', 'cx8', 'movbe', 'syscall', 'lahf_lm', 'stibp', 'cpuid', 'cx16', 'vme', 'umip', 'pdpe1gb', 'perfctr_nb', 'rdpru', 'smap', 'bmi1', 'tsc', 'cr8_legacy', 'lm', 'aperfmperf', 'pae', 'clzero', 'pfthreshold', 'vmcb_clean', 'svm', 'ssbd', 'ibpb_exit_to_user', 'overflow_recov', 'cqm_llc', 'ibpb', 'nx', 'adx', 'svm_lock', 'nrip_save', 'cmp_legacy', 'pat', 'clflushopt', 'constant_tsc', 'sse4a', 'sha_ni', 'v_vmsave_vmload', 'cqm', 'sse2', 'cdp_l3', 'pse36', 'rdrand', 'monitor', 'hw_pstate', 'irperf', 'cqm_mbm_local', 'perfctr_llc', 'osvw', 'rdtscp', 'abm', 'clwb', 'rapl', 'extd_apicid', 'xgetbv1', 'misalignsse', 'cqm_occup_llc', 'mca', 'xsave', 'v_spec_ctrl', 'npt', 'xsaveopt', 'mtrr', 'fma', 'rdpid', 'pclmulqdq', 'msr', 'pse', 'nonstop_tsc', 'nopl', 'fxsr_opt', 'extapic', 'lbrv', 'vmmcall', 'bmi2', 'pni', 'sse4_1', 'tce', 'sme', 'sev_es', 'pge', 'xsavec'}}}
-- CPU detection: vendor=amd arch=x86_64 features=['AVX2']
-- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON)
-- CMake configure args:
-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/
-DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11
-DCMAKE_BUILD_TYPE=Release
-DLLAMA_NATIVE=OFF
-DLLAMA_FMA=ON
-DLLAMA_F16C=ON
-DLLAMA_AVX=ON
-DLLAMA_AVX2=ON
-DKTRANSFORMERS_CPU_USE_AMX=OFF
-DKTRANSFORMERS_USE_CUDA=ON
-D
CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
-- CMake build args: --build . --config Release --parallel 8
installing to build/bdist.linux-x86_64/wheel
running install
running install_lib
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/kt_kernel
copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts_base.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
creating build/bdist.linux-x86_64/wheel/kt_kernel/utils
copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/amx.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/llamafile.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/loader.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils
copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
copying build/lib.linux-x86_64-cpython-311/kt_kernel/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel
copying build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/.
running install_egg_info
Copying kt_kernel.egg-info to build/bdist.linux-x86_64/wheel/./kt_kernel-0.1.0-py3.11.egg-info
running install_scripts
creating build/bdist.linux-x86_64/wheel/kt_kernel-0.1.0.dist-info/WHEEL
creating '/tmp/pip-wheel-ytzbvayt/.tmp-s9_0miv_/kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'kt_kernel_ext.cpython-311-x86_64-linux-gnu.so'
adding 'kt_kernel/__init__.py'
adding 'kt_kernel/experts.py'
adding 'kt_kernel/experts_base.py'
adding 'kt_kernel/utils/__init__.py'
adding 'kt_kernel/utils/amx.py'
adding 'kt_kernel/utils/llamafile.py'
adding 'kt_kernel/utils/loader.py'
adding 'kt_kernel-0.1.0.dist-info/METADATA'
adding 'kt_kernel-0.1.0.dist-info/WHEEL'
adding 'kt_kernel-0.1.0.dist-info/top_level.txt'
adding 'kt_kernel-0.1.0.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Building wheel for kt-kernel (pyproject.toml) ... done
Created wheel for kt-kernel: filename=kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl size=1088779 sha256=12260abea7a2ba7b90c186715bc5512d23198bd1a1f2e0b8e4d799c85e39d323
Stored in directory: /home/k1/.cache/pip/wheels/ac/0b/e5/74beab4a502dc518879a41bca5bc4af8470c8d1073a89aab1c
Successfully built kt-kernel
Installing collected packages: kt-kernel
Attempting uninstall: kt-kernel
Found existing installation: kt-kernel 0.1.0
Uninstalling kt-kernel-0.1.0:
Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel-0.1.0.dist-info/
Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/
Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so
Successfully uninstalled kt-kernel-0.1.0
Successfully installed kt-kernel-0.1.0
Successfully built and installed kt-kernel! with configuration:
CPUINFER_CPU_INSTRUCT=AVX2
CPUINFER_ENABLE_AMX=OFF
CPUINFER_BUILD_TYPE=Release
The problem remains.
I didn't understand. I have checked the sglang: https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/moe/kt_ep_wrapper.py
It uses the latest wrapper. So basically, you just need to pull the latest kt with sglang, then it works? Do you mean you want to use some specific version of KT with sglang?
yes, saw that, if built, but current sglang pip versions 0.5.5.post1 - 0.5.5.post2 do not have that
So, we can only use the source code to download. Get it, we will update doc to point out this.
After successfully installed kt-kernel and sglang, got a problem when running. Pretty sure nvcc is in system path.
k1@k0:~/ktransformers$ python -m sglang.launch_server --host 0.0.0.0 --port 60000 --model /home/k1/models/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL --kt-cpuinfer 12 --kt-threadpool-count 2 --kt-num-gpu-experts 200 --attention-backend flashinfer --trust-remote-code --mem-fraction-static 0.98 --chunked-prefill-size 4096 --max-running-requests 37 --max-total-tokens 37000 --enable-mixed-chunk --tensor-parallel-size 8 --enable-p2p-check --disable-shared-experts-fusion Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/launch_server.py", line 24, in <module> server_args = prepare_server_args(sys.argv[1:]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 4008, in prepare_server_args return ServerArgs.from_cli_args(raw_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 3616, in from_cli_args return cls(**{attr: getattr(args, attr) for attr in attrs}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<string>", line 275, in __init__ File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 595, in __post_init__ self._handle_model_specific_adjustments() File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/server_args.py", line 899, in _handle_model_specific_adjustments from sglang.srt.configs.model_config import is_deepseek_nsa File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/configs/model_config.py", line 26, in <module> from sglang.srt.layers.quantization import QUANTIZATION_METHODS File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/__init__.py", line 19, in <module> from sglang.srt.layers.quantization.auto_round import AutoRoundConfig File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/auto_round.py", line 12, in <module> from sglang.srt.layers.quantization.utils import get_scalar_types File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/utils.py", line 13, in <module> from sglang.srt.layers.quantization.fp8_kernel import scaled_fp8_quant File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sglang/srt/layers/quantization/fp8_kernel.py", line 46, in <module> from sgl_kernel import sgl_per_tensor_quant_fp8, sgl_per_token_quant_fp8 File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/__init__.py", line 9, in <module> _preload_cuda_library() File "/home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/sgl_kernel/load_utils.py", line 220, in _preload_cuda_library raise RuntimeError("Could not find CUDA lib directory.") RuntimeError: Could not find CUDA lib directory.Reinstalled with.
# Example for LLAMAFILE backend on AMX CPU with AVX512 export CPUINFER_CPU_INSTRUCT=AVX2 # Options: NATIVE, AVX512, AVX2 export CPUINFER_ENABLE_AMX=OFF # Options: ON, OFF export CMAKE_ARGS="-D CMAKE_CUDA_COMPILER=$(which nvcc)" ./install.sh --manual Checking and installing system dependencies... Installing cmake via conda... 2 channel Terms of Service accepted Channels: - defaults Platform: linux-64 Collecting package metadata (repodata.json): done Solving environment: done # All requested packages already installed. Detected Debian-based system. Installing libhwloc-dev and pkg-config... Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local InRelease [1,572 B] Get:1 file:/var/cuda-repo-ubuntu2204-12-8-local InRelease [1,572 B] Hit:2 http://mirrors.aliyun.com/ubuntu jammy InRelease Hit:3 http://mirrors.aliyun.com/ubuntu jammy-updates InRelease Hit:4 http://mirrors.aliyun.com/ubuntu jammy-backports InRelease Hit:5 https://mirrors.aliyun.com/docker-ce/linux/ubuntu jammy InRelease Hit:6 https://deb.nodesource.com/node_23.x nodistro InRelease Hit:7 https://apt.llvm.org/jammy llvm-toolchain-jammy-20 InRelease Hit:8 https://apt.llvm.org/jammy llvm-toolchain-jammy-18 InRelease Hit:9 http://security.ubuntu.com/ubuntu jammy-security InRelease Reading package lists... Done Building dependency tree... Done Reading state information... Done 8 packages can be upgraded. Run 'apt list --upgradable' to see them. Reading package lists... Done Building dependency tree... Done Reading state information... Done pkg-config is already the newest version (0.29.2-1ubuntu3). libhwloc-dev is already the newest version (2.7.0-2ubuntu1). 0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded. Building kt-kernel with configuration: CPUINFER_CPU_INSTRUCT=AVX2 CPUINFER_ENABLE_AMX=OFF CPUINFER_BUILD_TYPE=Release CPUINFER_PARALLEL=8 CPUINFER_VERBOSE=1 Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11) Processing /home/k1/ktransformers/kt-kernel Running command pip subprocess to install build dependencies Using pip 25.2 from /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/pip (python 3.11) Collecting setuptools>=61 Obtaining dependency information for setuptools>=61 from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB) Collecting wheel Obtaining dependency information for wheel from https://files.pythonhosted.org/packages/0b/2c/87f3254fd8ffd29e4c02732eee68a83a1d3c346ae39bc6822dcbcb697f2b/wheel-0.45.1-py3-none-any.whl.metadata Using cached wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB) Collecting cmake>=3.16 Obtaining dependency information for cmake>=3.16 from https://files.pythonhosted.org/packages/f3/56/0fc4d83f212cef10b7bbf6c5043e4582af80ad2aef6905e0dc33fbf68b11/cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (6.5 kB) Collecting pybind11 Obtaining dependency information for pybind11 from https://files.pythonhosted.org/packages/cd/8a/37362fc2b949d5f733a8b0f2ff51ba423914cabefe69f1d1b6aab710f5fe/pybind11-3.0.1-py3-none-any.whl.metadata Using cached pybind11-3.0.1-py3-none-any.whl.metadata (10.0 kB) Using cached setuptools-80.9.0-py3-none-any.whl (1.2 MB) Using cached wheel-0.45.1-py3-none-any.whl (72 kB) Using cached cmake-4.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (29.7 MB) Using cached pybind11-3.0.1-py3-none-any.whl (293 kB) Installing collected packages: wheel, setuptools, pybind11, cmake Creating /tmp/pip-build-env-y8t0fu_x/overlay/bin changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/wheel to 775 changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/pybind11-config to 775 changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ccmake to 775 changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cmake to 775 changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/cpack to 775 changing mode of /tmp/pip-build-env-y8t0fu_x/overlay/bin/ctest to 775 Successfully installed cmake-4.1.2 pybind11-3.0.1 setuptools-80.9.0 wheel-0.45.1 Installing build dependencies ... done Running command Getting requirements to build wheel /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml` corresp(dist, value, root_dir) running egg_info creating kt_kernel.egg-info writing kt_kernel.egg-info/PKG-INFO writing dependency_links to kt_kernel.egg-info/dependency_links.txt writing requirements to kt_kernel.egg-info/requires.txt writing top-level names to kt_kernel.egg-info/top_level.txt writing manifest file 'kt_kernel.egg-info/SOURCES.txt' reading manifest file 'kt_kernel.egg-info/SOURCES.txt' writing manifest file 'kt_kernel.egg-info/SOURCES.txt' Getting requirements to build wheel ... done Running command Preparing metadata (pyproject.toml) /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml` corresp(dist, value, root_dir) running dist_info creating /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info writing /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/PKG-INFO writing dependency_links to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/dependency_links.txt writing requirements to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/requires.txt writing top-level names to /tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/top_level.txt writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt' writing manifest file '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel.egg-info/SOURCES.txt' creating '/tmp/pip-modern-metadata-bqcwkl0y/kt_kernel-0.1.0.dist-info' Preparing metadata (pyproject.toml) ... done Requirement already satisfied: torch>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.8.0) Requirement already satisfied: safetensors>=0.4.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.6.2) Requirement already satisfied: compressed-tensors>=0.7.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.12.2) Requirement already satisfied: numpy>=1.24.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (2.3.4) Requirement already satisfied: triton>=2.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (3.4.0) Requirement already satisfied: gguf>=0.17.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (0.17.1) Requirement already satisfied: black>=25.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from kt-kernel==0.1.0) (25.11.0) Requirement already satisfied: click>=8.0.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (8.3.0) Requirement already satisfied: mypy-extensions>=0.4.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (1.1.0) Requirement already satisfied: packaging>=22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (25.0) Requirement already satisfied: pathspec>=0.9.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.12.1) Requirement already satisfied: platformdirs>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (4.5.0) Requirement already satisfied: pytokens>=0.3.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from black>=25.9.0->kt-kernel==0.1.0) (0.3.0) Requirement already satisfied: transformers in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.57.1) Requirement already satisfied: pydantic>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.12.4) Requirement already satisfied: loguru in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.3) Requirement already satisfied: pyyaml>=5.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (6.0.3) Requirement already satisfied: tqdm>=4.27 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from gguf>=0.17.0->kt-kernel==0.1.0) (4.67.1) Requirement already satisfied: annotated-types>=0.6.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.7.0) Requirement already satisfied: pydantic-core==2.41.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.41.5) Requirement already satisfied: typing-extensions>=4.14.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (4.15.0) Requirement already satisfied: typing-inspection>=0.4.2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from pydantic>=2.0->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.4.2) Requirement already satisfied: filelock in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.20.0) Requirement already satisfied: sympy>=1.13.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.14.0) Requirement already satisfied: networkx in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.5) Requirement already satisfied: jinja2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (3.1.6) Requirement already satisfied: fsspec in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2025.10.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90) Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (9.10.2.21) Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.4.1) Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.3.3.83) Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (10.3.9.90) Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (11.7.3.90) Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.5.8.93) Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (0.7.1) Requirement already satisfied: nvidia-nccl-cu12==2.27.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (2.27.3) Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.90) Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (12.8.93) Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from torch>=2.0.0->kt-kernel==0.1.0) (1.13.1.3) Requirement already satisfied: setuptools>=40.8.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from triton>=2.0.0->kt-kernel==0.1.0) (80.9.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from sympy>=1.13.3->torch>=2.0.0->kt-kernel==0.1.0) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from jinja2->torch>=2.0.0->kt-kernel==0.1.0) (3.0.3) Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.36.0) Requirement already satisfied: regex!=2019.12.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.3) Requirement already satisfied: requests in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.32.5) Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (0.22.1) Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (1.2.0) Requirement already satisfied: charset_normalizer<4,>=2 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.4.4) Requirement already satisfied: idna<4,>=2.5 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (3.11) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2.5.0) Requirement already satisfied: certifi>=2017.4.17 in /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages (from requests->transformers->compressed-tensors>=0.7.0->kt-kernel==0.1.0) (2025.11.12) Building wheels for collected packages: kt-kernel Running command Building wheel for kt-kernel (pyproject.toml) /tmp/pip-build-env-y8t0fu_x/overlay/lib/python3.11/site-packages/setuptools/config/_apply_pyprojecttoml.py:82: SetuptoolsWarning: `license` overwritten by `pyproject.toml` corresp(dist, value, root_dir) running bdist_wheel running build running build_py creating build/lib.linux-x86_64-cpython-311/kt_kernel copying python/experts_base.py -> build/lib.linux-x86_64-cpython-311/kt_kernel copying python/experts.py -> build/lib.linux-x86_64-cpython-311/kt_kernel copying python/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel creating build/lib.linux-x86_64-cpython-311/kt_kernel/utils copying python/utils/amx.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils copying python/utils/llamafile.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils copying python/utils/loader.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils copying python/utils/__init__.py -> build/lib.linux-x86_64-cpython-311/kt_kernel/utils running egg_info writing kt_kernel.egg-info/PKG-INFO writing dependency_links to kt_kernel.egg-info/dependency_links.txt writing requirements to kt_kernel.egg-info/requires.txt writing top-level names to kt_kernel.egg-info/top_level.txt reading manifest file 'kt_kernel.egg-info/SOURCES.txt' writing manifest file 'kt_kernel.egg-info/SOURCES.txt' running build_ext -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- No .git directory found; skipping git hooks installation -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- CMAKE_CXX_FLAGS: -O3 -ffast-math -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected CMake Warning at CMakeLists.txt:252 (message): pure AVX is not supported at least avx2 -- ARCH_FLAGS: -mf16c;-mfma;-mavx;-mfma;-msse3;-mf16c;-mavx2;-mfma;-msse3;-mf16c CMake Deprecation Warning at third_party/pybind11/CMakeLists.txt:13 (cmake_minimum_required): Compatibility with CMake < 3.10 will be removed from a future version of CMake. Update the VERSION argument <min> value. Or, use the <min>...<max> syntax to tell CMake that the project requires at least <min> but has been updated to work with policies introduced by <max> or earlier. -- pybind11 v2.14.0 dev1 -- Found PythonInterp: /home/k1/miniconda3/envs/kt/bin/python3.11 (found suitable version "3.11.14", minimum required is "3.7") -- Found PythonLibs: /home/k1/miniconda3/envs/kt/lib/libpython3.11.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Found Git: /usr/bin/git (found version "2.34.1") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- OpenMP found -- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF -- CMAKE_SYSTEM_PROCESSOR: x86_64 -- x86 detected -- CUDA detected -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.61") -- enabling CUDA -- The CUDA compiler identification is NVIDIA 12.8.61 with host compiler GNU 11.4.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- SOURCE_DIR7: CMake Warning at CMakeLists.txt:485 (message): clang-format not found. Please install clang-format (>=18) or pass -DCLANG_FORMAT_BIN=/full/path and reconfigure. -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2") -- Checking for one of the modules 'hwloc' -- LTO: disabled -- NUMA library found: /usr/lib/x86_64-linux-gnu/libnuma.so - enabling NUMA support -- Configuring done (17.1s) -- Generating done (0.0s) -- Build files have been written to: /home/k1/ktransformers/kt-kernel/build/temp.linux-x86_64-cpython-311/kt_kernel_ext_Release [ 1%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/flags.cpp.o [ 2%] Building CXX object third_party/llama.cpp/common/CMakeFiles/build_info.dir/build-info.cpp.o [ 3%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml.c.o [ 5%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o [ 7%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-alloc.c.o [ 9%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-quants.c.o [ 9%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o [ 10%] Building C object third_party/llama.cpp/CMakeFiles/ggml.dir/ggml-backend.c.o [ 11%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o [ 13%] Built target build_info [ 14%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/sgemm.cpp.o [ 15%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o [ 17%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o [ 18%] Building CXX object third_party/llama.cpp/CMakeFiles/ggml.dir/sgemm.cpp.o [ 19%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o [ 21%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o [ 22%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o [ 23%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o [ 25%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o [ 26%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o [ 27%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o [ 28%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o [ 30%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o [ 31%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o [ 32%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o [ 34%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o [ 35%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o [ 36%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o [ 38%] Building CXX object CMakeFiles/llamafile.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o [ 38%] Built target ggml [ 39%] Linking CXX static library libggml_static.a [ 40%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/llama.cpp.o [ 42%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode.cpp.o [ 43%] Building CXX object third_party/llama.cpp/CMakeFiles/llama.dir/unicode-data.cpp.o [ 43%] Built target ggml_static [ 44%] Linking CXX static library libllamafile.a [ 44%] Built target llamafile [ 46%] Linking CXX static library libllama.a [ 46%] Built target llama [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/ext_bindings.cpp.o [ 48%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/shared_mem_buffer.cpp.o [ 51%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/task_queue.cpp.o [ 51%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/console.cpp.o [ 52%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/linear.cpp.o [ 53%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/common.cpp.o [ 55%] Building CXX object CMakeFiles/kt_kernel_ext.dir/cpu_backend/worker_pool.cpp.o [ 56%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/sampling.cpp.o [ 57%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/grammar-parser.cpp.o [ 59%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/llamafile/mlp.cpp.o [ 60%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/flags.cpp.o [ 61%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_avx2.cpp.o [ 63%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o [ 64%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_amd_zen4.cpp.o [ 65%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/train.cpp.o [ 67%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/iqk_mul_mat_arm82.cpp.o [ 68%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/sgemm.cpp.o [ 69%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx.cpp.o [ 71%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx2.cpp.o [ 72%] Building CXX object third_party/llama.cpp/common/CMakeFiles/common.dir/ngram-cache.cpp.o [ 73%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avx512f.cpp.o [ 75%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_avxvnni.cpp.o [ 76%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_fma.cpp.o [ 77%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_amd_zen4.cpp.o [ 78%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm80.cpp.o [ 80%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_mixmul_arm82.cpp.o [ 81%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx.cpp.o [ 82%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx2.cpp.o [ 84%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avx512f.cpp.o [ 85%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_avxvnni.cpp.o [ 86%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_fma.cpp.o [ 88%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_amd_zen4.cpp.o [ 89%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm80.cpp.o [ 90%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_sgemm_arm82.cpp.o [ 92%] Building CXX object CMakeFiles/kt_kernel_ext.dir/third_party/llamafile/tinyblas_cpu_unsupported.cpp.o [ 93%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_attn.cpp.o [ 94%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_load_dump.cpp.o [ 96%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_read_write.cpp.o [ 97%] Building CXX object CMakeFiles/kt_kernel_ext.dir/operators/kvcache/kvcache_utils.cpp.o [ 98%] Linking CXX static library libcommon.a [ 98%] Built target common [100%] Linking CXX shared module /home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so [100%] Built target kt_kernel_ext -- CPUINFER_USE_CUDA not set; auto-detected CUDA toolkit: YES Detected CPU info: {'vendor': 'amd', 'arch': 'x86_64', 'features': {'AVX2'}, 'raw': {'flags': {'decodeassists', 'mba', 'fsgsbase', 'wdt', 'f16c', 'rep_good', 'mce', 'arat', 'rdt_a', 'tsc_scale', 'avic', 'wbnoinvd', 'flushbyasid', 'sev', 'mmx', 'apic', 'ibrs', 'vgif', 'fxsr', 'mmxext', 'ht', 'cmov', 'ibs', 'bpext', 'cpb', 'mwaitx', 'avx', 'smca', 'pausefilter', 'skinit', 'fpu', 'perfctr_core', 'ssse3', 'avx2', 'cat_l3', 'xsaveerptr', 'de', 'clflush', 'cqm_mbm_total', 'sep', 'rdseed', 'sse4_2', 'aes', 'sse', 'succor', 'smep', 'popcnt', 'topoext', 'xsaves', '3dnowprefetch', 'cx8', 'movbe', 'syscall', 'lahf_lm', 'stibp', 'cpuid', 'cx16', 'vme', 'umip', 'pdpe1gb', 'perfctr_nb', 'rdpru', 'smap', 'bmi1', 'tsc', 'cr8_legacy', 'lm', 'aperfmperf', 'pae', 'clzero', 'pfthreshold', 'vmcb_clean', 'svm', 'ssbd', 'ibpb_exit_to_user', 'overflow_recov', 'cqm_llc', 'ibpb', 'nx', 'adx', 'svm_lock', 'nrip_save', 'cmp_legacy', 'pat', 'clflushopt', 'constant_tsc', 'sse4a', 'sha_ni', 'v_vmsave_vmload', 'cqm', 'sse2', 'cdp_l3', 'pse36', 'rdrand', 'monitor', 'hw_pstate', 'irperf', 'cqm_mbm_local', 'perfctr_llc', 'osvw', 'rdtscp', 'abm', 'clwb', 'rapl', 'extd_apicid', 'xgetbv1', 'misalignsse', 'cqm_occup_llc', 'mca', 'xsave', 'v_spec_ctrl', 'npt', 'xsaveopt', 'mtrr', 'fma', 'rdpid', 'pclmulqdq', 'msr', 'pse', 'nonstop_tsc', 'nopl', 'fxsr_opt', 'extapic', 'lbrv', 'vmmcall', 'bmi2', 'pni', 'sse4_1', 'tce', 'sme', 'sev_es', 'pge', 'xsavec'}}} -- CPU detection: vendor=amd arch=x86_64 features=['AVX2'] -- Enabling CUDA backend (-DKTRANSFORMERS_USE_CUDA=ON) -- CMake configure args: -DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/home/k1/ktransformers/kt-kernel/build/lib.linux-x86_64-cpython-311/ -DPYTHON_EXECUTABLE=/home/k1/miniconda3/envs/kt/bin/python3.11 -DCMAKE_BUILD_TYPE=Release -DLLAMA_NATIVE=OFF -DLLAMA_FMA=ON -DLLAMA_F16C=ON -DLLAMA_AVX=ON -DLLAMA_AVX2=ON -DKTRANSFORMERS_CPU_USE_AMX=OFF -DKTRANSFORMERS_USE_CUDA=ON -D CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc -- CMake build args: --build . --config Release --parallel 8 installing to build/bdist.linux-x86_64/wheel running install running install_lib creating build/bdist.linux-x86_64/wheel creating build/bdist.linux-x86_64/wheel/kt_kernel copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts_base.py -> build/bdist.linux-x86_64/wheel/./kt_kernel creating build/bdist.linux-x86_64/wheel/kt_kernel/utils copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/amx.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/llamafile.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/loader.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils copying build/lib.linux-x86_64-cpython-311/kt_kernel/utils/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel/utils copying build/lib.linux-x86_64-cpython-311/kt_kernel/experts.py -> build/bdist.linux-x86_64/wheel/./kt_kernel copying build/lib.linux-x86_64-cpython-311/kt_kernel/__init__.py -> build/bdist.linux-x86_64/wheel/./kt_kernel copying build/lib.linux-x86_64-cpython-311/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/. running install_egg_info Copying kt_kernel.egg-info to build/bdist.linux-x86_64/wheel/./kt_kernel-0.1.0-py3.11.egg-info running install_scripts creating build/bdist.linux-x86_64/wheel/kt_kernel-0.1.0.dist-info/WHEEL creating '/tmp/pip-wheel-ytzbvayt/.tmp-s9_0miv_/kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it adding 'kt_kernel_ext.cpython-311-x86_64-linux-gnu.so' adding 'kt_kernel/__init__.py' adding 'kt_kernel/experts.py' adding 'kt_kernel/experts_base.py' adding 'kt_kernel/utils/__init__.py' adding 'kt_kernel/utils/amx.py' adding 'kt_kernel/utils/llamafile.py' adding 'kt_kernel/utils/loader.py' adding 'kt_kernel-0.1.0.dist-info/METADATA' adding 'kt_kernel-0.1.0.dist-info/WHEEL' adding 'kt_kernel-0.1.0.dist-info/top_level.txt' adding 'kt_kernel-0.1.0.dist-info/RECORD' removing build/bdist.linux-x86_64/wheel Building wheel for kt-kernel (pyproject.toml) ... done Created wheel for kt-kernel: filename=kt_kernel-0.1.0-cp311-cp311-linux_x86_64.whl size=1088779 sha256=12260abea7a2ba7b90c186715bc5512d23198bd1a1f2e0b8e4d799c85e39d323 Stored in directory: /home/k1/.cache/pip/wheels/ac/0b/e5/74beab4a502dc518879a41bca5bc4af8470c8d1073a89aab1c Successfully built kt-kernel Installing collected packages: kt-kernel Attempting uninstall: kt-kernel Found existing installation: kt-kernel 0.1.0 Uninstalling kt-kernel-0.1.0: Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel-0.1.0.dist-info/ Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel/ Removing file or directory /home/k1/miniconda3/envs/kt/lib/python3.11/site-packages/kt_kernel_ext.cpython-311-x86_64-linux-gnu.so Successfully uninstalled kt-kernel-0.1.0 Successfully installed kt-kernel-0.1.0 Successfully built and installed kt-kernel! with configuration: CPUINFER_CPU_INSTRUCT=AVX2 CPUINFER_ENABLE_AMX=OFF CPUINFER_BUILD_TYPE=ReleaseThe problem remains.
A less elegant temporary solution for _preload_cuda_library, CUDA_HOME=/usr/local/cuda-12.8 python -m sglang.launch_server ...
In this PR, I also fix this by scanning the CUDA toolkit:#1600 (see the change of setup.py)
It uses the latest wrapper. So basically, you just need to pull the latest kt with sglang, then it works? Do you mean you want to use some specific version of KT with sglang?