Regarding the issue with Multi-Concurrency Support, when using the backend_type and balance_serve parameters, a segmentation fault (segfault) occurs at the final stage of model loading.
Current configuration: OS: Ubuntu 22.04.1
ktransformers:0.3.1+cu128torch27fancy
CPU:GENUINE INTEL(R) XEON(R) Core(s) per socket: 72 Socket(s): 2
GPU:A6000
memory: 1.47T
Run cmd:
Build ktransformser cmd:
For those who have two cpu and 1T RAM(Dual NUMA):
USE_BALANCE_SERVE=1 USE_NUMA=1 bash ./install.sh
Run result:
Please provide the exact command how you run the inference server. Plus all the configs and logs (see ~/.ktransformers/logs/) etc.
[EDIT]: Also provide the info regarding your system (lshw etc.), the version of the drivers etc. (nvidia-smi).
Please provide the exact command how you run the inference server. Plus all the configs and logs (see ~/.ktransformers/logs/) etc.
[EDIT]: Also provide the info regarding your system (lshw etc.), the version of the drivers etc. (nvidia-smi).
HI,
This is my run cmd:
This is the log:
This is the config:
nvidia-smi.txt lshw.txt lscpu.txt
Basically, everything was done according to the balance-serve.md file, but I'm not sure if there is a configuration issue somewhere. Please help take a look, thanks
@Adadxz can you try with --log_level DEBUG parameter?
Is there any reason you're using V2 instead of V3-0324 ??
I mean, I dont even have V2 to reproduce your bug.
No, I just thought the V2 model was relatively small and would load quickly to debug this issue. However, I'm experiencing the same problem with V3.
I just changed the parameter from “ktransformers” to “balance_serve”, and this issue occurs. If I use the “ktransformers” parameter, it works well.
I just changed the parameter from “ktransformers” to “balance_serve”, and this issue occurs. If I use the “ktransformers” parameter, it works well.
You DO realize that for the balance_serve backend YOU HAVE TO use the optimized_config with '-serve' suffix, right?
[EDIT]: example: https://github.com/kvcache-ai/ktransformers/blob/main/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml
Yes
Yes
Cool. Can you provide the command that you used to run V3 ?
Sure
yeah its pretty hard to tell, but the code of yours that is crashing appears to be one of those:
28819 492e1: 48 8b 05 a0 4b 06 00 mov 0x64ba0(%rip),%rax # ade88 <_ZTVN9scheduler15QueryMaintainerE@@Base+0x1990>
28820 492e8: 48 83 c0 10 add $0x10,%rax
28821 492ec: 48 89 07 mov %rax,(%rdi)
28822 492ef: 48 8b 87 f0 01 00 00 mov 0x1f0(%rdi),%rax
28823 492f6: 48 8b b8 e0 01 00 00 mov 0x1e0(%rax),%rdi
28824 492fd: 48 8b 07 mov (%rdi),%rax
28825 49300: ff 50 18 call *0x18(%rax)
29172 49852: 48 8b 05 2f 46 06 00 mov 0x6462f(%rip),%rax # ade88 <_ZTVN9scheduler15QueryMaintainerE@@Base+0x1990>
29173 49859: 48 83 c0 10 add $0x10,%rax
29174 4985d: 48 89 03 mov %rax,(%rbx)
29175 49860: 48 8b 83 f0 01 00 00 mov 0x1f0(%rbx),%rax
29176 49867: 48 8b b8 e0 01 00 00 mov 0x1e0(%rax),%rdi
29177 4986e: 48 8b 07 mov (%rdi),%rax
29178 49871: ff 50 18 call *0x18(%rax)
29179 49874: b8 01 00 00 00 mov $0x1,%eax
its seems like you are trying to access the method of the QueryMaintainer which is ... not initialized?
Try to do like:
ldd /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/libsched.so
linux-vdso.so.1 (0x00007f322943c000)
libc10.so => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libc10.so (0x00007f3229272000)
libkvc2.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libkvc2.so (0x00007f3229194000)
libasync_store.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libasync_store.so (0x00007f3228c00000)
libsched_metrics.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libsched_metrics.so (0x00007f3229187000)
libtorch.so => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libtorch.so (0x00007f3229138000)
libtorch_cpu.so => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libtorch_cpu.so (0x00007f3214600000)
libtorch_cuda.so => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libtorch_cuda.so (0x00007f31de800000)
libprometheus-cpp-core.so.1.3 => /opt/ktransformers/ktransformers/csrc/balance_serve/build/third_party/prometheus-cpp/lib/libprometheus-cpp-core.so.1.3 (0x00007f322910c000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f31de400000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f32290c6000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f31de20a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f322943e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f32290bf000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f3228b10000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f32290ba000)
libcache_entry.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libcache_entry.so (0x00007f3229052000)
libpage_aligned_memory_pool.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libpage_aligned_memory_pool.so (0x00007f321459f000)
libgpu_cache.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libgpu_cache.so (0x00007f321452b000)
libcuda_stream_manager.so => /opt/ktransformers/ktransformers/build/lib.linux-x86_64-cpython-313/libcuda_stream_manager.so (0x00007f32144ca000)
libprometheus-cpp-pull.so.1.3 => /opt/ktransformers/ktransformers/csrc/balance_serve/build/third_party/prometheus-cpp/lib/libprometheus-cpp-pull.so.1.3 (0x00007f322901b000)
libaio.so.1t64 => /lib/x86_64-linux-gnu/libaio.so.1t64 (0x00007f3228b0b000)
libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f31de710000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f31de0fc000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f31dda00000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f3228b06000)
libgomp.so.1 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libgomp.so.1 (0x00007f31dd600000)
libcupti.so.12 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007f31dce00000)
libcudart.so.12 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007f31dca00000)
libc10_cuda.so => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/libc10_cuda.so (0x00007f31de66b000)
libcusparse.so.12 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007f31c5400000)
libcufft.so.11 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00007f31b4000000)
libcufile.so.0 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cufile/lib/libcufile.so.0 (0x00007f31b3c00000)
libcusparseLt.so.0 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cusparselt/lib/libcusparseLt.so.0 (0x00007f3198a00000)
libnccl.so.2 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x00007f317fe00000)
libcurand.so.10 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00007f3177200000)
libcublas.so.12 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x00007f3170000000)
libcublasLt.so.12 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007f313dc00000)
libcudnn.so.9 => /opt/ktransformers/.ktransformers/lib/python3.13/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 (0x00007f313d800000)
libnghttp3.so.9 => /lib/x86_64-linux-gnu/libnghttp3.so.9 (0x00007f32144a0000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f321446e000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f31de0c9000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f31de0ab000)
libssh2.so.1 => /lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f31de063000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f31de04f000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f31dd9aa000)
libldap.so.2 => /lib/x86_64-linux-gnu/libldap.so.2 (0x00007f31dd946000)
liblber.so.2 => /lib/x86_64-linux-gnu/liblber.so.2 (0x00007f31de03e000)
libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f31dd87c000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f31dd86e000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f31dd84e000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f3214469000)
libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x00007f3137a00000)
libunistring.so.5 => /lib/x86_64-linux-gnu/libunistring.so.5 (0x00007f31dc818000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f3137600000)
libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f31dd5b5000)
libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f31dcdaa000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f31dcd20000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f31c5328000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f31dd587000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f31de038000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f31dd840000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f31dcd04000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f31dcce1000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f313da5f000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f31dd571000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f31dccda000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f31dccc8000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f31dccbb000)
You might have some unresolved dependencies.
I met the same error.
python ./ktransformers/server/main.py --port 10002 --model_path deepseek-ai/DeepSeek-V3.1 --gguf_path /data/llm-models/unsloth/DeepSeek-V3.1-GGUF --optimize_config_path ./ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat-serve.yaml --cache_lens 32768 --chunk_size 256 --max_new_tokens 8000 --max_batch_size 4 --backend_type balance_serve --architectures DeepseekV3ForCausalLM
dmesg:
[2071735.012167] python3[1850295]: segfault at 1e0 ip 00007fd37d63aacf sp 00007ffec2218b20 error 4 in libsched.so[7fd37d5f9000+5d000]
ldd:
(llm) ➜ lib ldd ~/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libsched.so
linux-vdso.so.1 (0x00007fff7cf8b000)
libc10.so => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libc10.so (0x00007f8a27f50000)
libkvc2.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libkvc2.so (0x00007f8a27dfd000)
libasync_store.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libasync_store.so (0x00007f8a27998000)
libsched_metrics.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libsched_metrics.so (0x00007f8a27985000)
libtorch.so => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libtorch.so (0x00007f8a27951000)
libtorch_cpu.so => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so (0x00007f8a13b28000)
libtorch_cuda.so => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so (0x00007f89d9bec000)
libprometheus-cpp-core.so.1.3 => /home/mgi527a/Softwares/ktransformers/csrc/balance_serve/build/third_party/prometheus-cpp/lib/libprometheus-cpp-core.so.1.3 (0x00007f89d9bb8000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f89d992a000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f89d9904000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f89d96f2000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8a28183000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f89d96eb000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f89d9602000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f89d95fd000)
libcache_entry.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libcache_entry.so (0x00007f89d9572000)
libpage_aligned_memory_pool.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libpage_aligned_memory_pool.so (0x00007f89d94f4000)
libgpu_cache.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libgpu_cache.so (0x00007f89d9458000)
libcuda_stream_manager.so => /home/mgi527a/Softwares/ktransformers/build/lib.linux-x86_64-cpython-311/libcuda_stream_manager.so (0x00007f89d93d7000)
libprometheus-cpp-pull.so.1.3 => /home/mgi527a/Softwares/ktransformers/csrc/balance_serve/build/third_party/prometheus-cpp/lib/libprometheus-cpp-pull.so.1.3 (0x00007f89d939a000)
libaio.so.1 => /lib/x86_64-linux-gnu/libaio.so.1 (0x00007f89d9395000)
libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f89d92ee000)
libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f89d9248000)
libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f89d8e04000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f89d8dff000)
libgomp.so.1 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libgomp.so.1 (0x00007f89d8a00000)
libcupti.so.12 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cuda_cupti/lib/libcupti.so.12 (0x00007f89d820c000)
libcudart.so.12 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cuda_runtime/lib/libcudart.so.12 (0x00007f89d7e00000)
libc10_cuda.so => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/libc10_cuda.so (0x00007f89d8d4c000)
libcusparse.so.12 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12 (0x00007f89c6200000)
libcufft.so.11 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cufft/lib/libcufft.so.11 (0x00007f89b5000000)
libcufile.so.0 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cufile/lib/libcufile.so.0 (0x00007f89b4d28000)
libcusparseLt.so.0 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../cusparselt/lib/libcusparseLt.so.0 (0x00007f89a6600000)
libcurand.so.10 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/curand/lib/libcurand.so.10 (0x00007f89a0000000)
libcublas.so.12 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.12 (0x00007f8999600000)
libcublasLt.so.12 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cublas/lib/libcublasLt.so.12 (0x00007f8977c00000)
libcudnn.so.9 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.9 (0x00007f8977800000)
libnccl.so.2 => /home/mgi527a/anaconda3/envs/ktransformers/lib/python3.11/site-packages/torch/lib/../../nvidia/nccl/lib/libnccl.so.2 (0x00007f8968000000)
libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f89d8d1e000)
libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f89d8cfd000)
librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f89d8cde000)
libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007f89d8c6e000)
libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f89d8c5a000)
libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f89d81b8000)
libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x00007f89d8158000)
liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x00007f89d8c49000)
libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f89d7d31000)
libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f89d814a000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f89d812e000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f89d8c42000)
libnvJitLink.so.12 => /home/mgi527a/anaconda3/envs/llm/lib/python3.11/site-packages/nvidia/cusparse/lib/libnvJitLink.so.12 (0x00007f8964a00000)
libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f89c6056000)
libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f8964815000)
libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f89d80e6000)
libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f89d7ceb000)
libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f89d7c69000)
libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f89b4c5d000)
libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f89d80b7000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f89d80b1000)
libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f89d7c5b000)
libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f89d7c40000)
libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f89d7c1b000)
libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f89a64c5000)
libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f89c603e000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f89d7c14000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f89c602b000)
libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f89d7c05000)
@sunnsi Oh well, it can only mean that there is a bug in ktransformers. Considering that there were no commits for about a month, and the number of unresolved issues is just over the top, its safe to say that the project is dead. Please don't waste your time and switch to ik_llama.cpp. If you have several GPUs expect a HUGE boost in prefill. Oh, BTW now ik_llama.cpp supports the tool calls and, interestingly, its also supports the export/import of the KV cache into the files. So there is no reasons to use ktransformers anymore lol
@magikRUKKOLA Thanks for the information. I will have a look at ik_llama.cpp.