ModuleNotFoundError: No module named 'transformers.masking_utils'
新版本的transformers cache接口有冲突,而老版本的transformers似乎还没支持deepseek v2(貌似4.54.0才支持)?所以没太明白tutorial里面的deepseek v2 lite怎么跑起来的,例如我这边transformers新老版本都会报错,求问怎么解决?感谢。
- KTransformers: 0.3.2
- CUDA: 12.6
- PyTorch: 2.6.0+cu126
- OS: Ubuntu 20.04.6 LTS
python ktransformers.local_chat --model_path ./DeepSeek-V2-Lite --gguf_path ./DeepSeek-V2-Lite-Chat-GGUF
4.51.3:
Injecting model.layers.0.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 197, in <module>
fire.Fire(local_chat)
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 124, in local_chat
optimize_and_load_gguf(model, optimize_config_path, gguf_path, config, default_device=device)
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
inject(module, optimize_config, model_config, weights_loader)
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
[Previous line repeated 1 more time]
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject
module_cls=getattr(__import__(import_module_name, fromlist=[""]), import_class_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/operators/RoPE.py", line 30, in <module>
from ktransformers.models.modeling_glm4_moe import Glm4MoeRotaryEmbedding
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/models/modeling_glm4_moe.py", line 32, in <module>
from transformers.masking_utils import create_causal_mask
ModuleNotFoundError: No module named 'transformers.masking_utils'
4.56.1
Chat: hello
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 197, in <module>
fire.Fire(local_chat)
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 186, in local_chat
generated = prefill_and_generate(
^^^^^^^^^^^^^^^^^^^^^
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/util/utils.py", line 308, in prefill_and_generate
past_key_values = StaticCache(
^^^^^^^^^^^^
File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/models/custom_cache.py", line 38, in __init__
Cache.__init__(self)
File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/transformers/cache_utils.py", line 752, in __init__
raise ValueError(
ValueError: You should provide exactly one of `layers` or `layer_class_to_replicate` to initialize a Cache.
I'm getting the same 'transformers.masking_utils' error when I try to load.
no balance_serve
W0930 16:06:02.902000 115051 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0930 16:06:02.902000 115051 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
2025-09-30 16:06:02,907 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
found flashinfer
using custom modeling_xxx.py.
DeepseekV2ForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using
trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes - If you are the owner of the model architecture code, please modify your model class such that it inherits from
GenerationMixin(afterPreTrainedModel, otherwise you'll get an exception). - If you are not the owner of the model architecture class, please contact the model code owner to update it.
using default_optimize_rule for DeepseekV2ForCausalLM
Injecting model as ktransformers.operators.models . KDeepseekV2Model
Injecting model.embed_tokens as default
Injecting model.layers as default
Injecting model.layers.0 as default
Injecting model.layers.0.self_attn as ktransformers.operators.attention . KDeepseekV2Attention
Injecting model.layers.0.self_attn.q_proj as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.0.self_attn.kv_a_proj_with_mqa as ktransformers.operators.linear . KTransformersLinear
Injecting model.layers.0.self_attn.kv_a_layernorm as default
Injecting model.layers.0.self_attn.kv_b_proj as default
Injecting model.layers.0.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear
Traceback (most recent call last):
File "
", line 198, in _run_module_as_main File " ", line 88, in _run_code File "/home/kronos/ktransformers/ktransformers/local_chat.py", line 197, in fire.Fire(local_chat) File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/ktransformers/ktransformers/local_chat.py", line 124, in local_chat optimize_and_load_gguf(model, optimize_config_path, gguf_path, config, default_device=device) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf inject(module, optimize_config, model_config, weights_loader) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) [Previous line repeated 1 more time] File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject module_cls=getattr(import(import_module_name, fromlist=[""]), import_class_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/ktransformers/ktransformers/operators/RoPE.py", line 30, in from ktransformers.models.modeling_glm4_moe import Glm4MoeRotaryEmbedding File "/home/kronos/ktransformers/ktransformers/models/modeling_glm4_moe.py", line 32, in from transformers.masking_utils import create_causal_mask ModuleNotFoundError: No module named 'transformers.masking_utils'
packages in environment at /home/kronos/anaconda3/envs/ktransformers:
Name Version Build Channel
_libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu accelerate 1.10.1 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi anyio 4.11.0 pypi_0 pypi blessed 1.22.0 pypi_0 pypi blobfile 3.1.0 pypi_0 pypi build 1.3.0 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 ca-certificates 2025.9.9 h06a4308_0 certifi 2025.8.3 pypi_0 pypi charset-normalizer 3.4.3 pypi_0 pypi click 8.3.0 pypi_0 pypi colorlog 6.9.0 pypi_0 pypi cpufeature 0.2.1 pypi_0 pypi distro 1.9.0 pypi_0 pypi einops 0.8.1 pypi_0 pypi expat 2.7.1 h6a678d5_0 fastapi 0.118.0 pypi_0 pypi filelock 3.13.1 pypi_0 pypi fire 0.7.1 pypi_0 pypi flash-attn 2.8.3 pypi_0 pypi flashinfer-python 0.2.3 pypi_0 pypi fsspec 2024.6.1 pypi_0 pypi greenlet 3.2.4 pypi_0 pypi h11 0.16.0 pypi_0 pypi hf-xet 1.1.10 pypi_0 pypi httpcore 1.0.9 pypi_0 pypi httpx 0.28.1 pypi_0 pypi huggingface-hub 0.35.3 pypi_0 pypi idna 3.10 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi jiter 0.11.0 pypi_0 pypi jsonpatch 1.33 pypi_0 pypi jsonpointer 3.0.0 pypi_0 pypi ktransformers 0.3.2+cu128torch28fancy pypi_0 pypi langchain 0.3.27 pypi_0 pypi langchain-core 0.3.76 pypi_0 pypi langchain-text-splitters 0.3.11 pypi_0 pypi langsmith 0.4.31 pypi_0 pypi ld_impl_linux-64 2.40 h12ee557_0 libffi 3.4.4 h6a678d5_1 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge libuuid 1.41.5 h5eee18b_0 libxcb 1.17.0 h9b100fa_0 libzlib 1.3.1 hb25bd0a_0 lxml 6.0.2 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi ncurses 6.5 h7934f7d_0 networkx 3.3 pypi_0 pypi ninja 1.13.0 pypi_0 pypi numpy 2.1.2 pypi_0 pypi nvidia-cublas-cu12 12.8.4.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.8.90 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.8.93 pypi_0 pypi nvidia-cuda-runtime-cu12 12.8.90 pypi_0 pypi nvidia-cudnn-cu12 9.10.2.21 pypi_0 pypi nvidia-cufft-cu12 11.3.3.83 pypi_0 pypi nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi nvidia-curand-cu12 10.3.9.90 pypi_0 pypi nvidia-cusolver-cu12 11.7.3.90 pypi_0 pypi nvidia-cusparse-cu12 12.5.8.93 pypi_0 pypi nvidia-cusparselt-cu12 0.7.1 pypi_0 pypi nvidia-nccl-cu12 2.27.3 pypi_0 pypi nvidia-nvjitlink-cu12 12.8.93 pypi_0 pypi nvidia-nvtx-cu12 12.8.90 pypi_0 pypi openai 1.109.1 pypi_0 pypi openssl 3.0.17 h5eee18b_0 orjson 3.11.3 pypi_0 pypi packaging 25.0 pypi_0 pypi pillow 11.0.0 pypi_0 pypi pip 25.2 pyhc872135_0 protobuf 6.32.1 pypi_0 pypi psutil 7.1.0 pypi_0 pypi pthread-stubs 0.3 h0ce48e5_1 pycryptodomex 3.23.0 pypi_0 pypi pydantic 2.11.9 pypi_0 pypi pydantic-core 2.33.2 pypi_0 pypi pyproject-hooks 1.2.0 pypi_0 pypi python 3.11.13 h1a3bd86_0 pyyaml 6.0.3 pypi_0 pypi pyzmq 27.1.0 pypi_0 pypi readline 8.3 hc2a1206_0 regex 2025.9.18 pypi_0 pypi requests 2.32.5 pypi_0 pypi requests-toolbelt 1.0.0 pypi_0 pypi safetensors 0.6.2 pypi_0 pypi sentencepiece 0.2.1 pypi_0 pypi setuptools 78.1.1 py311h06a4308_0 sniffio 1.3.1 pypi_0 pypi sqlalchemy 2.0.43 pypi_0 pypi sqlite 3.50.2 hb25bd0a_1 starlette 0.48.0 pypi_0 pypi sympy 1.13.3 pypi_0 pypi tenacity 9.1.2 pypi_0 pypi termcolor 3.1.0 pypi_0 pypi tiktoken 0.11.0 pypi_0 pypi tk 8.6.15 h54e0aa7_0 tokenizers 0.21.4 pypi_0 pypi torch 2.8.0+cu128 pypi_0 pypi torchaudio 2.8.0+cu128 pypi_0 pypi torchvision 0.23.0+cu128 pypi_0 pypi tqdm 4.67.1 pypi_0 pypi transformers 4.51.3 pypi_0 pypi triton 3.4.0 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi typing-inspection 0.4.1 pypi_0 pypi tzdata 2025b h04d1e81_0 urllib3 2.5.0 pypi_0 pypi uvicorn 0.37.0 pypi_0 pypi wcwidth 0.2.14 pypi_0 pypi wheel 0.45.1 py311h06a4308_0 xorg-libx11 1.8.12 h9b100fa_1 xorg-libxau 1.0.12 h9b100fa_0 xorg-libxdmcp 1.1.5 h9b100fa_0 xorg-xorgproto 2024.1 h5eee18b_1 xz 5.6.4 h5eee18b_1 zlib 1.3.1 hb25bd0a_0 zmq 0.0.0 pypi_0 pypi zstandard 0.25.0 pypi_0 pypi
Would anyone have any insight or point us in the right direction?
使用
pip install -U transformers
似乎解决了ModuleNotFoundError: No module named 'transformers.masking_utils问题
但还是会出现
ValueError: You should provide exactly one of layersorlayer_class_to_replicate to initialize a Cache.
研究了几天发现其实很简单,在开始git clone的时候改为clone指定版本就行了
git clone --branch v0.3.2 https://github.com/kvcache-ai/ktransformers.git
Chat: hello my friend
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/venv/lib/python3.12/site-packages/ktransformers/local_chat.py", line 197, in <module>
fire.Fire(local_chat)
File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/ktransformers/local_chat.py", line 186, in local_chat
generated = prefill_and_generate(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/ktransformers/util/utils.py", line 308, in prefill_and_generate
past_key_values = StaticCache(
^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/ktransformers/models/custom_cache.py", line 38, in __init__
Cache.__init__(self)
File "/opt/venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 700, in __init__
raise ValueError(
ValueError: You should provide exactly one of `layers` or `layer_class_to_replicate` to initialize a Cache.