TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

AttributeError: 'QWenConfig' object has no attribute 'seq_length'

Open shahizat opened this issue 9 months ago • 2 comments

System Info

Collecting environment information... PyTorch version: 2.6.0a0+ecf3bae40a.nv25.01 Is debug build: False CUDA used to build PyTorch: 12.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 Clang version: 18.1.3 (1ubuntu1) CMake version: Could not collect Libc version: glibc-2.39

Python version: 3.12.3 (main, Feb 4 2025, 14:48:35) [GCC 13.3.0] (64-bit runtime) Python platform: Linux-6.8.0-52-generic-x86_64-with-glibc2.39 Is CUDA available: True CUDA runtime version: 12.8.61 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5090 Nvidia driver version: 570.124.06 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1 /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: AuthenticAMD Model name: AMD Ryzen Threadripper 3970X 32-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 32 Socket(s): 1 Stepping: 0 Frequency boost: enabled CPU(s) scaling MHz: 60% CPU max MHz: 4549.1211 CPU min MHz: 2200.0000 BogoMIPS: 7400.05 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es Virtualization: AMD-V L1d cache: 1 MiB (32 instances) L1i cache: 1 MiB (32 instances) L2 cache: 16 MiB (32 instances) L3 cache: 128 MiB (8 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-63 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] nvidia-cuda-nvrtc-cu12==12.8.61 [pip3] nvidia-nccl-cu12==2.25.1 [pip3] onnx==1.17.0 [pip3] onnx_graphsurgeon==0.5.5 [pip3] pytorch-triton==3.1.0+cf34004b8.internal [pip3] torch==2.6.0a0+ecf3bae40a.nv25.1 [pip3] torchprofile==0.0.4 [pip3] torchvision==0.20.0a0 [pip3] tritonfrontend==2.55.0 [pip3] tritonserver==0.0.0 [conda] Could not collect

Who can help?

Hello Nvidia Team,

Could you please add support for the Qwen2.5 models? I am using the 25.02-trtllm-python-py3 Triton image.

Thanks in advance!

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

python3 /engines/TensorRT-LLM/examples/llama/convert_checkpoint.py \
--model_dir ./Qwen2.5-14B-Instruct \
--output_dir ./Qwen2.5-14B-Instruct-convert   \
--dtype float16 

trtllm-build \
--checkpoint_dir ./Qwen2.5-14B-Instruct-convert \
--gemm_plugin float16 \
--output_dir ./Qwen2.5-14B-Instruct-engine

Expected behavior

N/A

actual behavior

N/A

additional notes

N/A

shahizat avatar Mar 26 '25 12:03 shahizat

@shahizat

Hi Shahizat,

I think QWen 2.5 in general is supported in TRT-LLM:

  • https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/qwen#support-matrix

While it is possible that there can be some small issues specifically to the model you are using.

Since TRT-LLM is already github firstly now, you are encouraged to do the investigation and submit a fix directly to resolve this issue.

Thanks June

juney-nvidia avatar Mar 26 '25 14:03 juney-nvidia

Hi @shahizat

Can you try this? (qwen/convert_checkpoint.py not llama)

python3 /engines/TensorRT-LLM/examples/qwen/convert_checkpoint.py \
--model_dir ./Qwen2.5-14B-Instruct \
--output_dir ./Qwen2.5-14B-Instruct-convert   \
--dtype float16 

lkm2835 avatar Mar 26 '25 18:03 lkm2835

Hi @lkm2835, thanks, my bad, anyways failed to run on the machine with 5090 GPU....

shahizat avatar Mar 27 '25 14:03 shahizat

Close this since it is already solved.

Thanks @lkm2835 for supporting the community :)

juney-nvidia avatar Mar 28 '25 00:03 juney-nvidia