lmdeploy [Bug] lmdeploy - ERROR - base.py:53 - RuntimeError: Internal Triton PTX codegen error: ptxas fatal : Value 'sm_120' is not defined for option 'gpu-name'

Checklist

[x] 1. I have searched related issues but cannot get the expected help.
[ ] 2. The bug has not been fixed in the latest version.
[ ] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

大佬们，RTX5090 使用lmdeploy 直接安装后无法加载模型启动，有没有遇到这个问题及解决方案，感激不尽！

Reproduction

CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server /home/ubuntu/Qwen2___5-VL-7B-Instruct-FP8-Dynamic/ --model-name Qwen2___5-VL-7B-Instruct-FP8-Dynamic --server-port 23333 --backend pytorch --cache-max-entry-count 0.2

Environment

(lmdeploy) root@ubun:~# lmdeploy check_env 
sys.platform: linux
Python: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1: NVIDIA GeForce RTX 5090 D
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 12.8, V12.8.93
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.8.0.dev20250407+cu128
PyTorch compiling details: PyTorch built with:
  - GCC 11.2
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.8
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_100,code=sm_100;-gencode;arch=compute_120,code=sm_120;-gencode;arch=compute_120,code=compute_120
  - CuDNN 90.8
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=2b568a51f192d042b30f3f57171d0ee431e94be8, CUDA_VERSION=12.8, CUDNN_VERSION=9.8.0, CXX_COMPILER=/opt/rh/gcc-toolset-11/root/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.8.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.22.0.dev20250407+cu128
LMDeploy: 0.7.2.post1+
transformers: 4.51.1
gradio: Not Found
fastapi: 0.103.2
pydantic: 2.11.3
triton: 3.1.0
NVIDIA Topology: 
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NODE    0-23    0               N/A
GPU1    NODE     X      0-23    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

Error traceback

CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server /home/ubuntu/Qwen2___5-VL-7B-Instruct-FP8-Dynamic/ --model-name Qwen2___5-VL-7B-Instruct-FP8-Dynamic  --server-port 23333 --backend pytorch --cache-max-entry-count 0.2 
Fast image processor class <class 'transformers.models.qwen2_vl.image_processing_qwen2_vl_fast.Qwen2VLImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
'sm_120' is not a recognized processor for this target (ignoring processor)
'sm_120' is not a recognized processor for this target (ignoring processor)
'sm_120' is not a recognized processor for this target (ignoring processor)
'sm_120' is not a recognized processor for this target (ignoring processor)
'sm_120' is not a recognized processor for this target (ignoring processor)
'sm_120' is not a recognized processor for this target (ignoring processor)
2025-04-09 09:41:02,222 - lmdeploy - ERROR - base.py:53 - RuntimeError: Internal Triton PTX codegen error: 
ptxas fatal   : Value 'sm_120' is not defined for option 'gpu-name'

2025-04-09 09:41:02,222 - lmdeploy - ERROR - base.py:54 - <Triton> check failed!

Apr 09 '25 09:04 xieyabinfuwu

LMDeploy hasn't supported RTX 50 series yet

Apr 09 '25 10:04 lvhan028

i could supply access to a machine with 4x5090 if that would help

Apr 09 '25 14:04 JokerGT

We will do some environment checks before starting engine. As the log indicate, the package triton check failed. Our triton check is a simple vector add kernel. https://github.com/InternLM/lmdeploy/blob/05914be1e0ef3ceb9d0ce37ca4912bc3ec2e2864/lmdeploy/pytorch/check_env/triton_custom_add.py#L28-L33

Update your triton version or ask https://github.com/triton-lang/triton if they can help.

Apr 10 '25 03:04 grimoire