lmdeploy [Bug] SDAR-30B-A3B-Chat 推理报错

Checklist

[ ] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

Model path: /inspire/hdd/project/embodied-multimodality/liuxiaoran-240108120089/public/SDAR-30B-A3B-Chat
Applied chat template: ['<|im_start|>user\n1+1等于几<|im_end|>\n<|im_start|>assistant\n']
Tokenized input IDs: [151644, 872, 198, 16, 10, 16, 107106, 99195, 151645, 198, 151644, 77091, 198]
[before pipeline] allocated: 0.00 MB, reserved: 0.00 MB
Traceback (most recent call last):
  File "/inspire/hdd/project/embodied-multimodality/liuxiaoran-240108120089/projects_zhuying/SDAR_Trainer/reasoning/test.py", line 56, in <module>
    with pipeline(model_path, backend_config=backend_config) as pipe:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/api.py", line 78, in pipeline
    return pipeline_class(model_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/serve/async_engine.py", line 287, in __init__
    self.engine = self._build_pytorch(model_path=model_path, backend_config=backend_config, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/serve/async_engine.py", line 347, in _build_pytorch
    return Engine.from_pretrained(model_path, engine_config=backend_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/engine.py", line 459, in from_pretrained
    return cls(model_path=pretrained_model_name_or_path,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/engine.py", line 386, in __init__
    self.executor.init()
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/executor/base.py", line 189, in init
    self.build_model()
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/executor/uni_executor.py", line 53, in build_model
    self.model_agent.build_model()
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/model_agent.py", line 935, in build_model
    self._build_model()
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/engine/model_agent.py", line 925, in _build_model
    load_model_weights(patched_model, model_path, device=device)
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 193, in load_model_weights
    loader = ModelWeightLoader(checkpoint_path, prefix=prefix)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 123, in __init__
    self._shard_paths = self._get_shard_paths(model_path, is_sharded, weight_type)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 129, in _get_shard_paths
    weight_map = _get_weight_map(model_path, weight_type)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/lmdeploy/pytorch/weight_loader/model_weight_loader.py", line 72, in _get_weight_map
    index = json.load(f)
            ^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 293, in load
    return loads(fp.read(),
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
               ^^^^^^^^^^^^^^^^^^^^^^

sdar的其他模型测下来没问题，moe这个模型推理报错

Reproduction

    # 进入 with，上下文结束时会自动释放
    with pipeline(model_path, backend_config=backend_config) as pipe:
        print_memory("inside pipeline - after load", mem_log)

        gen_config = GenerationConfig(
            top_p=1.0,
            top_k=50,
            temperature=1.0,
            do_sample=False,  # greedy
            max_new_tokens=512,
        )
        print("Generation config:", gen_config)

        outputs = pipe(prompts, gen_config=gen_config)
        for idx, output in enumerate(outputs):
            print("Output:", output)
            print("Decoded tokens:", [tokenizer.decode(x) for x in output.token_ids])

Environment

lmdeploy

Error traceback

Oct 21 '25 13:10 Auraithm

读 json 文件的时候出错了，可以检查一下模型完整性

Oct 22 '25 04:10 grimoire

Using cached metadata: SDAR-30B-A3B-Chat/.hfd/repo_metadata.json Resume from file list: SDAR-30B-A3B-Chat/.hfd/aria2c_urls.txt Starting download with aria2c to SDAR-30B-A3B-Chat... No files to download. Download completed successfully. Repo directory: ../tools/hfd/SDAR-30B-A3B-Chat ，看样子是完整的？

Oct 22 '25 04:10 Auraithm

看 Log 报错的地方就是在读模型路径下的 model.safetensors.index.json

https://github.com/InternLM/lmdeploy/blob/7e3869ec375a124d8ad3bb73e9aac416cbce9dfb/lmdeploy/pytorch/weight_loader/model_weight_loader.py#L72

可以试试看单独用 json.load 读这个文件。我这边本地无法复现这个问题

Oct 22 '25 06:10 grimoire