TensorRT-LLM 【Error】ValueError: Unknown architecture for AutoModelForCausalLM: DeepseekV3ForCausalLM

When try to use trtllm-serve to deploy a deepseek v3 serving program, like this:

 trtllm-serve --host 0.0.0.0 --port 8100 --max_batch_size 32 --max_num_tokens 4096 --max_seq_len 4096 --tp_size 8 --trust_remote_code /models/deepseek-ai/deepseek-v3/

An error occurred, how can I solve this problem?

[TensorRT-LLM] TensorRT-LLM version: 0.17.0.post1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 1250, in get_model_format
    AutoConfig.from_hugging_face(model_dir)
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/automodel.py", line 31, in from_hugging_face
    raise NotImplementedError(
NotImplementedError: The given huggingface model architecture DeepseekV3ForCausalLM is not supported in TRT-LLM yet

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/trtllm-serve", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/serve.py", line 78, in main
    llm_args = LlmArgs.from_kwargs(
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 544, in from_kwargs
    ret.setup()
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 590, in setup
    self.model_format = ModelLoader.get_model_format(self.model)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 1252, in get_model_format
    raise ValueError(
ValueError: Inferred model format _ModelFormatKind.HF, but failed to load config.json: The given huggingface model architecture DeepseekV3ForCausalLM is not supported in TRT-LLM yet

Mar 03 '25 07:03 tingjun-cs

You can try using the deepseek branch to deploy DeepSeekV3

Mar 03 '25 08:03 zhangts20

@tingjun-cs things move very fast on deepseek-v3, please follow the instructions here, let us know if you met any issues .. thanks!

Mar 06 '25 03:03 dominicshanshan

I got the same issue when using DeepSeek-R1 by following NVIDIA official instruction: https://huggingface.co/nvidia/DeepSeek-R1-FP4#deploy-with-tensorrt-llm

Should I manually build TensorRT-LLM from scratch? Why not NVIDIA release an official wheel or docker image?

Mar 23 '25 03:03 yejingfu

I tried the latest wheel(0.19.0) from https://pypi.nvidia.com/tensorrt-llm/. It can load DeepSeekV3 model, but failed on binding: Traceback (most recent call last): File "/codes/mlops/scripts/trtllm/offline.py", line 2, in from tensorrt_llm import SamplingParams File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/init.py", line 32, in import tensorrt_llm.functional as functional File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/functional.py", line 28, in from . import graph_rewriting as gw File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/graph_rewriting.py", line 11, in from ._utils import trt_gte File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_utils.py", line 38, in from tensorrt_llm.bindings import DataType, GptJsonConfig ImportError: /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationESs

Mar 23 '25 05:03 yejingfu

You can try using the deepseek branch to deploy DeepSeekV3

Where is the deepseek branch? I didn't find it in this repo.

Apr 01 '25 13:04 k-l-lambda

You can try using the deepseek branch to deploy DeepSeekV3

Where is the deepseek branch? I didn't find it in this repo.

It seems this branch was deleted, you can use main now. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3

Apr 02 '25 01:04 zhangts20

You can try using the deepseek branch to deploy DeepSeekV3

Where is the deepseek branch? I didn't find it in this repo.

It seems this branch was deleted, you can use main now. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3

I tried the main branch (commit 992d51), build from source, but I still got the error of The given huggingface model architecture DeepseekV3ForCausalLM is not supported. Did I miss something?

Apr 02 '25 03:04 k-l-lambda

You can try using the deepseek branch to deploy DeepSeekV3

Where is the deepseek branch? I didn't find it in this repo.

It seems this branch was deleted, you can use main now. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3

I tried the main branch (commit 992d51), build from source, but I still got the error of The given huggingface model architecture DeepseekV3ForCausalLM is not supported. Did I miss something?

You should use pytorch backend like this, and no need to export to tensorrt engine. It seems the tensorrt backend of deepseek is discarded in main:

trtllm-serve \
    ${model_dir} \
    --backend 'pytorch' \
    --max_batch_size 128 \
    --max_num_tokens 16384 \
    --tp_size 8 \
    --ep_size 8 \
    --trust_remote_code \
    --extra_llm_api_options extra-llm-api-config.yml

Apr 02 '25 08:04 zhangts20

TensorRT-LLM TensorRT-LLM copied to clipboard

【Error】ValueError: Unknown architecture for AutoModelForCausalLM: DeepseekV3ForCausalLM

TensorRT-LLM
TensorRT-LLM copied to clipboard