ktransformers icon indicating copy to clipboard operation
ktransformers copied to clipboard

ModuleNotFoundError: No module named 'transformers.masking_utils'

Open Yuan-Allen opened this issue 3 months ago • 4 comments

新版本的transformers cache接口有冲突,而老版本的transformers似乎还没支持deepseek v2(貌似4.54.0才支持)?所以没太明白tutorial里面的deepseek v2 lite怎么跑起来的,例如我这边transformers新老版本都会报错,求问怎么解决?感谢。

  • KTransformers: 0.3.2
  • CUDA: 12.6
  • PyTorch: 2.6.0+cu126
  • OS: Ubuntu 20.04.6 LTS
python  ktransformers.local_chat --model_path ./DeepSeek-V2-Lite --gguf_path ./DeepSeek-V2-Lite-Chat-GGUF

4.51.3:

Injecting model.layers.0.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 197, in <module>
    fire.Fire(local_chat)
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 124, in local_chat
    optimize_and_load_gguf(model, optimize_config_path, gguf_path, config, default_device=device)
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf
    inject(module, optimize_config, model_config, weights_loader)
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
    inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
    inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject
    inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix)
  [Previous line repeated 1 more time]
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject
    module_cls=getattr(__import__(import_module_name, fromlist=[""]), import_class_name)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/operators/RoPE.py", line 30, in <module>
    from ktransformers.models.modeling_glm4_moe import Glm4MoeRotaryEmbedding
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/models/modeling_glm4_moe.py", line 32, in <module>
    from transformers.masking_utils import create_causal_mask
ModuleNotFoundError: No module named 'transformers.masking_utils'

4.56.1

Chat: hello
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 197, in <module>
    fire.Fire(local_chat)
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/local_chat.py", line 186, in local_chat
    generated = prefill_and_generate(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/util/utils.py", line 308, in prefill_and_generate
    past_key_values = StaticCache(
                      ^^^^^^^^^^^^
  File "/home/hangyuyuan/Workspace/ktransformers/ktransformers/models/custom_cache.py", line 38, in __init__
    Cache.__init__(self)
  File "/home/hangyuyuan/anaconda3/envs/ktransformers/lib/python3.11/site-packages/transformers/cache_utils.py", line 752, in __init__
    raise ValueError(
ValueError: You should provide exactly one of `layers` or `layer_class_to_replicate` to initialize a Cache.

Yuan-Allen avatar Sep 28 '25 08:09 Yuan-Allen

I'm getting the same 'transformers.masking_utils' error when I try to load.

no balance_serve W0930 16:06:02.902000 115051 site-packages/torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. W0930 16:06:02.902000 115051 site-packages/torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures. 2025-09-30 16:06:02,907 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend found flashinfer using custom modeling_xxx.py. DeepseekV2ForCausalLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From 👉v4.50👈 onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.

  • If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
  • If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
  • If you are not the owner of the model architecture class, please contact the model code owner to update it. using default_optimize_rule for DeepseekV2ForCausalLM Injecting model as ktransformers.operators.models . KDeepseekV2Model Injecting model.embed_tokens as default Injecting model.layers as default Injecting model.layers.0 as default Injecting model.layers.0.self_attn as ktransformers.operators.attention . KDeepseekV2Attention Injecting model.layers.0.self_attn.q_proj as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.0.self_attn.kv_a_proj_with_mqa as ktransformers.operators.linear . KTransformersLinear Injecting model.layers.0.self_attn.kv_a_layernorm as default Injecting model.layers.0.self_attn.kv_b_proj as default Injecting model.layers.0.self_attn.o_proj as ktransformers.operators.linear . KTransformersLinear Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/home/kronos/ktransformers/ktransformers/local_chat.py", line 197, in fire.Fire(local_chat) File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( ^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/anaconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/ktransformers/ktransformers/local_chat.py", line 124, in local_chat optimize_and_load_gguf(model, optimize_config_path, gguf_path, config, default_device=device) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 128, in optimize_and_load_gguf inject(module, optimize_config, model_config, weights_loader) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 42, in inject inject(child, child_optimization_dict, model_config, gguf_loader, child_prefix) [Previous line repeated 1 more time] File "/home/kronos/ktransformers/ktransformers/optimize/optimize.py", line 31, in inject module_cls=getattr(import(import_module_name, fromlist=[""]), import_class_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kronos/ktransformers/ktransformers/operators/RoPE.py", line 30, in from ktransformers.models.modeling_glm4_moe import Glm4MoeRotaryEmbedding File "/home/kronos/ktransformers/ktransformers/models/modeling_glm4_moe.py", line 32, in from transformers.masking_utils import create_causal_mask ModuleNotFoundError: No module named 'transformers.masking_utils'

packages in environment at /home/kronos/anaconda3/envs/ktransformers:

Name Version Build Channel

_libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu accelerate 1.10.1 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi anyio 4.11.0 pypi_0 pypi blessed 1.22.0 pypi_0 pypi blobfile 3.1.0 pypi_0 pypi build 1.3.0 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 ca-certificates 2025.9.9 h06a4308_0 certifi 2025.8.3 pypi_0 pypi charset-normalizer 3.4.3 pypi_0 pypi click 8.3.0 pypi_0 pypi colorlog 6.9.0 pypi_0 pypi cpufeature 0.2.1 pypi_0 pypi distro 1.9.0 pypi_0 pypi einops 0.8.1 pypi_0 pypi expat 2.7.1 h6a678d5_0 fastapi 0.118.0 pypi_0 pypi filelock 3.13.1 pypi_0 pypi fire 0.7.1 pypi_0 pypi flash-attn 2.8.3 pypi_0 pypi flashinfer-python 0.2.3 pypi_0 pypi fsspec 2024.6.1 pypi_0 pypi greenlet 3.2.4 pypi_0 pypi h11 0.16.0 pypi_0 pypi hf-xet 1.1.10 pypi_0 pypi httpcore 1.0.9 pypi_0 pypi httpx 0.28.1 pypi_0 pypi huggingface-hub 0.35.3 pypi_0 pypi idna 3.10 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi jiter 0.11.0 pypi_0 pypi jsonpatch 1.33 pypi_0 pypi jsonpointer 3.0.0 pypi_0 pypi ktransformers 0.3.2+cu128torch28fancy pypi_0 pypi langchain 0.3.27 pypi_0 pypi langchain-core 0.3.76 pypi_0 pypi langchain-text-splitters 0.3.11 pypi_0 pypi langsmith 0.4.31 pypi_0 pypi ld_impl_linux-64 2.40 h12ee557_0 libffi 3.4.4 h6a678d5_1 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 13.2.0 hc0a3c3a_7 conda-forge libuuid 1.41.5 h5eee18b_0 libxcb 1.17.0 h9b100fa_0 libzlib 1.3.1 hb25bd0a_0 lxml 6.0.2 pypi_0 pypi markupsafe 2.1.5 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi ncurses 6.5 h7934f7d_0 networkx 3.3 pypi_0 pypi ninja 1.13.0 pypi_0 pypi numpy 2.1.2 pypi_0 pypi nvidia-cublas-cu12 12.8.4.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.8.90 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.8.93 pypi_0 pypi nvidia-cuda-runtime-cu12 12.8.90 pypi_0 pypi nvidia-cudnn-cu12 9.10.2.21 pypi_0 pypi nvidia-cufft-cu12 11.3.3.83 pypi_0 pypi nvidia-cufile-cu12 1.13.1.3 pypi_0 pypi nvidia-curand-cu12 10.3.9.90 pypi_0 pypi nvidia-cusolver-cu12 11.7.3.90 pypi_0 pypi nvidia-cusparse-cu12 12.5.8.93 pypi_0 pypi nvidia-cusparselt-cu12 0.7.1 pypi_0 pypi nvidia-nccl-cu12 2.27.3 pypi_0 pypi nvidia-nvjitlink-cu12 12.8.93 pypi_0 pypi nvidia-nvtx-cu12 12.8.90 pypi_0 pypi openai 1.109.1 pypi_0 pypi openssl 3.0.17 h5eee18b_0 orjson 3.11.3 pypi_0 pypi packaging 25.0 pypi_0 pypi pillow 11.0.0 pypi_0 pypi pip 25.2 pyhc872135_0 protobuf 6.32.1 pypi_0 pypi psutil 7.1.0 pypi_0 pypi pthread-stubs 0.3 h0ce48e5_1 pycryptodomex 3.23.0 pypi_0 pypi pydantic 2.11.9 pypi_0 pypi pydantic-core 2.33.2 pypi_0 pypi pyproject-hooks 1.2.0 pypi_0 pypi python 3.11.13 h1a3bd86_0 pyyaml 6.0.3 pypi_0 pypi pyzmq 27.1.0 pypi_0 pypi readline 8.3 hc2a1206_0 regex 2025.9.18 pypi_0 pypi requests 2.32.5 pypi_0 pypi requests-toolbelt 1.0.0 pypi_0 pypi safetensors 0.6.2 pypi_0 pypi sentencepiece 0.2.1 pypi_0 pypi setuptools 78.1.1 py311h06a4308_0 sniffio 1.3.1 pypi_0 pypi sqlalchemy 2.0.43 pypi_0 pypi sqlite 3.50.2 hb25bd0a_1 starlette 0.48.0 pypi_0 pypi sympy 1.13.3 pypi_0 pypi tenacity 9.1.2 pypi_0 pypi termcolor 3.1.0 pypi_0 pypi tiktoken 0.11.0 pypi_0 pypi tk 8.6.15 h54e0aa7_0 tokenizers 0.21.4 pypi_0 pypi torch 2.8.0+cu128 pypi_0 pypi torchaudio 2.8.0+cu128 pypi_0 pypi torchvision 0.23.0+cu128 pypi_0 pypi tqdm 4.67.1 pypi_0 pypi transformers 4.51.3 pypi_0 pypi triton 3.4.0 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi typing-inspection 0.4.1 pypi_0 pypi tzdata 2025b h04d1e81_0 urllib3 2.5.0 pypi_0 pypi uvicorn 0.37.0 pypi_0 pypi wcwidth 0.2.14 pypi_0 pypi wheel 0.45.1 py311h06a4308_0 xorg-libx11 1.8.12 h9b100fa_1 xorg-libxau 1.0.12 h9b100fa_0 xorg-libxdmcp 1.1.5 h9b100fa_0 xorg-xorgproto 2024.1 h5eee18b_1 xz 5.6.4 h5eee18b_1 zlib 1.3.1 hb25bd0a_0 zmq 0.0.0 pypi_0 pypi zstandard 0.25.0 pypi_0 pypi


Would anyone have any insight or point us in the right direction?

js-2024 avatar Sep 30 '25 20:09 js-2024

使用 pip install -U transformers 似乎解决了ModuleNotFoundError: No module named 'transformers.masking_utils问题 但还是会出现 ValueError: You should provide exactly one of layersorlayer_class_to_replicate to initialize a Cache.

zhizi42 avatar Oct 14 '25 04:10 zhizi42

研究了几天发现其实很简单,在开始git clone的时候改为clone指定版本就行了 git clone --branch v0.3.2 https://github.com/kvcache-ai/ktransformers.git

zhizi42 avatar Oct 18 '25 05:10 zhizi42

Chat: hello my friend
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/venv/lib/python3.12/site-packages/ktransformers/local_chat.py", line 197, in <module>
    fire.Fire(local_chat)
  File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/ktransformers/local_chat.py", line 186, in local_chat
    generated = prefill_and_generate(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/ktransformers/util/utils.py", line 308, in prefill_and_generate
    past_key_values = StaticCache(
                      ^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/ktransformers/models/custom_cache.py", line 38, in __init__
    Cache.__init__(self)
  File "/opt/venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 700, in __init__
    raise ValueError(
ValueError: You should provide exactly one of `layers` or `layer_class_to_replicate` to initialize a Cache.

johnnynunez avatar Oct 24 '25 05:10 johnnynunez