ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

[Bug] in support-qwen3next branch, transformers version has not high

Open PPXGS opened this issue 3 months ago • 5 comments

检查清单

  • [ ] 1. 我已经搜索过相关问题,但未能获得预期的帮助
  • [ ] 2. 该问题在最新版本中尚未修复
  • [ ] 3. 请注意,如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例,我们将难以复现和定位问题,降低获得反馈的可能性
  • [ ] 4. 如果您提出的不是bug而是问题,请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
  • [ ] 5. 为方便社区交流,我将使用中文/英文或附上中文/英文翻译(如使用其他语言)。未附带翻译的非中文/英语内容可能会被关闭

问题描述

when build ktransformers in support-qwen3next, it is successful. But the transformers version can not match the qwen-3 next models' version.

复现步骤

Afer building, the ktransformers version is 0.3.2+cu124torch26fancy, and transformers version is 4.51.3. but Qwen3-Next-80B-A3B-Instruct need transformers version is "transformers_version": "4.57.0.dev0". when I run python ktransformers/server/main.py \ --port 10021 \ --model_path /localnvme/application/common/models/Qwen/Qwen3-Next-80B-A3B-Instruct \ --model_name Qwen3NextForCausalLM \ --optimize_config_path /localnvme/application/zhangzn/ktransformers_v0.3.2/ktransformers/ktransformers/optimize/optimize_rules/Qwen3Next-serve.yaml \ --max_new_tokens 1024 \ --cache_lens 32768 \ --chunk_size 256 \ --max_batch_size 4 \ --no-use_cuda_graph \ --backend_type balance_serve

I get the error: ImportError: cannot import name 'layer_type_validation' from 'transformers.configuration_utils' (/localnvme/application/zhangzn/anaconda3/envs/ktransformers_support-qwen3next/lib/python3.11/site-packages/transformers/configuration_utils.py)

Image

环境信息

ktransformers 0.3.2+cu124torch26fancy transformers 4.51.3 cuda 12.4 python 3.11 Ubuntu 20.04 GPU NVIDIA A800 ×8

PPXGS avatar Sep 12 '25 08:09 PPXGS

pip uninstall transformers -y pip install git+https://github.com/huggingface/transformers.git

CYSTEV-chn avatar Sep 16 '25 07:09 CYSTEV-chn

python ktransformers/server/main.py --port 10021 --model_path /root/Qwen.Qwen3-Next-80B-A3B-Instruct-FP8-Dynamic --gguf_path /root/Qwen.Qwen3-Next-80B-A3B-Instruct-FP8-Dynamic --model_name Qwen3NextForCausalLM --backend_type balance_serve W0922 09:30:59.383000 462160 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. W0922 09:30:59.383000 462160 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures. 2025-09-22 09:30:59,385 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend found flashinfer flash_attn not found, flashinfer unit test needed it. If you are using balance serve, ignore this. set start method Connected to server at tcp://localhost:37793 W0922 09:31:07.075000 462268 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. W0922 09:31:07.075000 462268 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures. 2025-09-22 09:31:07,078 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend found flashinfer flash_attn not found, flashinfer unit test needed it. If you are using balance serve, ignore this. start method already set to spawn Connected to server at tcp://localhost:37793 args.architectures: Qwen3NextForCausalLM The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d Injecting model as default Injecting model.embed_tokens as default ...... Injecting model.layers.47 as default Injecting model.layers.47.self_attn as ktransformers.operators.balance_serve_attention . KQwen3NextAttention Injecting model.layers.47.self_attn.q_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.self_attn.k_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.self_attn.v_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.self_attn.q_norm as ktransformers.operators.layernorm . KQwen3NextRMSNorm Injecting model.layers.47.self_attn.k_norm as ktransformers.operators.layernorm . KQwen3NextRMSNorm Injecting model.layers.47.mlp as ktransformers.operators.experts . KQwen3NextSparseMoeBlockV2 Injecting model.layers.47.mlp.gate as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.mlp.experts as ktransformers.operators.experts . KTransformersExpertsV2 Injecting model.layers.47.mlp.shared_expert as ktransformers.operators.mlp . KQwen2MoeMLP Injecting model.layers.47.mlp.shared_expert.gate_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.mlp.shared_expert.up_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.mlp.shared_expert.down_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.47.mlp.shared_expert.act_fn as default Injecting model.layers.47.mlp.shared_expert_gate as default Injecting model.layers.47.input_layernorm as ktransformers.operators.layernorm . KQwen3NextRMSNorm Injecting model.layers.47.post_attention_layernorm as ktransformers.operators.layernorm . KQwen3NextRMSNorm Injecting model.norm as ktransformers.operators.layernorm . KQwen3NextRMSNorm Injecting model.rotary_emb as ktransformers.operators.RoPE . KQwen3MoeRotaryEmbedding Injecting cache as default Injecting lm_head as ktransformers.operators.linear . KTransformersLinear loading model.embed_tokens.weight to cpu loading model.layers.0.linear_attn.dt_bias to cuda loading model.layers.0.linear_attn.A_log to cuda loading model.layers.0.linear_attn.conv1d.weight to cuda:0 Process SpawnProcess-1: Traceback (most recent call last): File "/root/anaconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/root/anaconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 308, in run_engine engine = Engine(args, token_queue, broadcast_endpoint, kvcache_event) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 212, in init optimize_and_load_gguf(self.model, optimize_config_path, gguf_path, config) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 131, in optimize_and_load_gguf load_weights(module, weights_loader, device=default_device) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights load_weights(child, gguf_loader, prefix+name+".", device=device) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights load_weights(child, gguf_loader, prefix+name+".", device=device) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights load_weights(child, gguf_loader, prefix+name+".", device=device) [Previous line repeated 1 more time] File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights module.load() File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/base_operator.py", line 63, in load utils.load_weights(child, self.gguf_loader, self.key+".") File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights load_weights(child, gguf_loader, prefix+name+".", device=device) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights module.load() File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 944, in load self.generate_linear.load(w=w) File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 653, in load marlin_q_w, marlin_s, g_idx, sort_indices, _ = marlin_quantize( ^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/marlin_utils.py", line 93, in marlin_quantize q_w, s, g_idx, rand_perm = quantize_weights(w, num_bits, group_size, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/anaconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/ktransformers_ext/operators/custom_marlin/quantize/utils/quant_utils.py", line 61, in quantize_weights s = torch.max(torch.abs(w), 0, keepdim=True)[0] ^^^^^^^^^^^^ NotImplementedError: "abs_cuda" not implemented for 'Float8_e4m3fn'

mj520 avatar Sep 22 '25 01:09 mj520

那就不知道了,只知道用dev版transform依赖可以加载,其他低版本的transform我也试过,连正常加载都做不到。

CYSTEV-chn avatar Sep 22 '25 08:09 CYSTEV-chn

你先可以跑了吗?我用原版qwen3-next是可以跑的。但 https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 版本报错

Injecting lm_head as ktransformers.operators.linear . KTransformersLinear
loading model.embed_tokens.weight to cpu
loading model.layers.0.linear_attn.dt_bias to cuda
loading model.layers.0.linear_attn.A_log to cuda
loading model.layers.0.linear_attn.conv1d.weight to cuda:0
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 308, in run_engine
    engine = Engine(args, token_queue, broadcast_endpoint, kvcache_event)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 212, in __init__
    optimize_and_load_gguf(self.model, optimize_config_path, gguf_path, config)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 131, in optimize_and_load_gguf
    load_weights(module, weights_loader, device=default_device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  [Previous line repeated 1 more time]
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights
    module.load()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/base_operator.py", line 63, in load
    utils.load_weights(child, self.gguf_loader, self.key+".")
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights
    module.load()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 944, in load
    self.generate_linear.load(w=w)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 638, in load
    self.bias = w[1].view(self.orin_out_features)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[12288]' is invalid for input of size 1536
^CTraceback (most recent call last):
  File "/home/ubuntu/ktransformers/ktransformers/server/main.py", line 122, in <module>
    main()
  File "/home/ubuntu/ktransformers/ktransformers/server/main.py", line 109, in main
    create_interface(config=cfg, default_args=cfg)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/utils/create_interface.py", line 30, in create_interface
    GlobalInterface.interface = BackendInterface(default_args)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 350, in __init__
    kvcache_event.wait()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/synchronize.py", line 356, in wait
    self._cond.wait(timeout)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/synchronize.py", line 268, in wait
    return self._wait_semaphore.acquire(True, timeout)

harveyff avatar Sep 29 '25 04:09 harveyff

你先可以跑了吗?我用原版qwen3-next是可以跑的。但 https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct-FP8 版本报错

Injecting lm_head as ktransformers.operators.linear . KTransformersLinear
loading model.embed_tokens.weight to cpu
loading model.layers.0.linear_attn.dt_bias to cuda
loading model.layers.0.linear_attn.A_log to cuda
loading model.layers.0.linear_attn.conv1d.weight to cuda:0
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 308, in run_engine
    engine = Engine(args, token_queue, broadcast_endpoint, kvcache_event)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 212, in __init__
    optimize_and_load_gguf(self.model, optimize_config_path, gguf_path, config)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/optimize/optimize.py", line 131, in optimize_and_load_gguf
    load_weights(module, weights_loader, device=default_device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  [Previous line repeated 1 more time]
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights
    module.load()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/base_operator.py", line 63, in load
    utils.load_weights(child, self.gguf_loader, self.key+".")
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 174, in load_weights
    load_weights(child, gguf_loader, prefix+name+".", device=device)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/util/utils.py", line 176, in load_weights
    module.load()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 944, in load
    self.generate_linear.load(w=w)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/operators/linear.py", line 638, in load
    self.bias = w[1].view(self.orin_out_features)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[12288]' is invalid for input of size 1536
^CTraceback (most recent call last):
  File "/home/ubuntu/ktransformers/ktransformers/server/main.py", line 122, in <module>
    main()
  File "/home/ubuntu/ktransformers/ktransformers/server/main.py", line 109, in main
    create_interface(config=cfg, default_args=cfg)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/utils/create_interface.py", line 30, in create_interface
    GlobalInterface.interface = BackendInterface(default_args)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 350, in __init__
    kvcache_event.wait()
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/synchronize.py", line 356, in wait
    self._cond.wait(timeout)
  File "/home/ubuntu/miniconda3/envs/ktransformers/lib/python3.11/multiprocessing/synchronize.py", line 268, in wait
    return self._wait_semaphore.acquire(True, timeout)

@harveyff 原版的跑的很慢基本不动,你那边可以正常跑吗?

ligang0357-glitch avatar Sep 30 '25 13:09 ligang0357-glitch