TensorRT-LLM
TensorRT-LLM copied to clipboard
【Error】ValueError: Unknown architecture for AutoModelForCausalLM: DeepseekV3ForCausalLM
When try to use trtllm-serve to deploy a deepseek v3 serving program, like this:
trtllm-serve --host 0.0.0.0 --port 8100 --max_batch_size 32 --max_num_tokens 4096 --max_seq_len 4096 --tp_size 8 --trust_remote_code /models/deepseek-ai/deepseek-v3/
An error occurred, how can I solve this problem?
[TensorRT-LLM] TensorRT-LLM version: 0.17.0.post1
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 1250, in get_model_format
AutoConfig.from_hugging_face(model_dir)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/models/automodel.py", line 31, in from_hugging_face
raise NotImplementedError(
NotImplementedError: The given huggingface model architecture DeepseekV3ForCausalLM is not supported in TRT-LLM yet
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/trtllm-serve", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/serve.py", line 78, in main
llm_args = LlmArgs.from_kwargs(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 544, in from_kwargs
ret.setup()
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 590, in setup
self.model_format = ModelLoader.get_model_format(self.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm_utils.py", line 1252, in get_model_format
raise ValueError(
ValueError: Inferred model format _ModelFormatKind.HF, but failed to load config.json: The given huggingface model architecture DeepseekV3ForCausalLM is not supported in TRT-LLM yet
You can try using the deepseek branch to deploy DeepSeekV3
@tingjun-cs things move very fast on deepseek-v3, please follow the instructions here, let us know if you met any issues .. thanks!
I got the same issue when using DeepSeek-R1 by following NVIDIA official instruction: https://huggingface.co/nvidia/DeepSeek-R1-FP4#deploy-with-tensorrt-llm
Should I manually build TensorRT-LLM from scratch? Why not NVIDIA release an official wheel or docker image?
I tried the latest wheel(0.19.0) from https://pypi.nvidia.com/tensorrt-llm/. It can load DeepSeekV3 model, but failed on binding:
Traceback (most recent call last):
File "/codes/mlops/scripts/trtllm/offline.py", line 2, in
You can try using the
deepseekbranch to deploy DeepSeekV3
Where is the deepseek branch? I didn't find it in this repo.
You can try using the
deepseekbranch to deploy DeepSeekV3Where is the
deepseekbranch? I didn't find it in this repo.
It seems this branch was deleted, you can use main now. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3
You can try using the
deepseekbranch to deploy DeepSeekV3Where is the
deepseekbranch? I didn't find it in this repo.It seems this branch was deleted, you can use
mainnow. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3
I tried the main branch (commit 992d51), build from source, but I still got the error of The given huggingface model architecture DeepseekV3ForCausalLM is not supported. Did I miss something?
You can try using the
deepseekbranch to deploy DeepSeekV3Where is the
deepseekbranch? I didn't find it in this repo.It seems this branch was deleted, you can use
mainnow. https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3I tried the main branch (commit 992d51), build from source, but I still got the error of
The given huggingface model architecture DeepseekV3ForCausalLM is not supported. Did I miss something?
You should use pytorch backend like this, and no need to export to tensorrt engine. It seems the tensorrt backend of deepseek is discarded in main:
trtllm-serve \
${model_dir} \
--backend 'pytorch' \
--max_batch_size 128 \
--max_num_tokens 16384 \
--tp_size 8 \
--ep_size 8 \
--trust_remote_code \
--extra_llm_api_options extra-llm-api-config.yml