TensorRT-LLM issues

convert qwen2.5-VL fail

1

### System Info x86_64， 128G RTX3090 24G TensorRT-LLM 0.19.0 cuda 12.8.93 host system ubuntu 20.04 host GPU driver 550.144.03 TensorRT 10.9.0.34 cuBLAS 12.8.4.1 CONTAINER ID IMAGE ec1bbab4b4aa tensorrt_llm/release:latest ### Who...

dzy130120

not a bug

[Feat] add chunked-attention kernels on Hopper (for llama4)

28

# Add chunked-attention kernels on Hopper (for llama4) fmha_v2 commit: 6552b99d4820fa3f5e8a48a392681a8c128bf623 ## Description Please explain the issue and the solution in short. ## Test Coverage ## GitHub Bot Help `/bot...

PerkzZheng

[AutoDeploy] Refactor AutoDeploy torch custom op to attach `auto_deploy` prefix to the op namespace

Add `auto_deploy` namespace to uniquely identify all the custom ops defined in auto_deploy/custom_ops. This could avoid potential namespace conflicts for ops defined in the manual workflow.

suyoggupta

AutoDeploy

pass rotary_emb_base to gpt_attention

1.pass rotary_emb_base to gpt attention tensorrt_llm/models/qwen/model.py 2.change variable name： rotary_base -> rotary_emb_base examples/qwen/build.py

FightingMan

triaged

Community want to contribute

Generic Runtime

Include an option when no quantization mode is needed

Hi, I have added a NONE value to the QuantMode class because of the following two reasons: - 'none' is present in cpp/tensorrt_llm/common/quantization.h but not here. - by adding it,...

miguelusque

triaged

Community want to contribute

Low Precision

skip special token during inference

att. Else the outputs will contain many `` (eos token).

littletomatodonkey

triaged

Community want to contribute

Generic Runtime

add cudnn_root arguments for build_wheel.py for not build TensorRT-LL…

add cudnn_root arguments for build_wheel.py for not build TensorRT-LLM in docker image. when someone has a local environment in their docker, they don't want to create a new docker image....

ycsos

triaged

Community want to contribute

Generic Runtime

Fix baichuan smoothquant/INT8 KV cache build error

The baichuan convert script lacks `scale_y_accum_quant`, `scale_w_quant_orig` value saving.

BasicCoder

triaged

Community want to contribute

Generic Runtime

fix: GPTBenchmark object has no attribute num_kv_heads

run command: ```python python benchmark.py -m bloom_560m --batch_size "1" --input_output_len "1024,20" --engine_dir /some/dir ``` Errors: ```python Traceback (most recent call last): File "/workspace/volume/wangchao2/TensorRT-LLM/benchmarks/python/benchmark.py", line 322, in main(args) File "/workspace/volume/wangchao2/TensorRT-LLM/benchmarks/python/benchmark.py", line...

NaNAGISaSA

triaged

Community want to contribute

Generic Runtime

Activation Function Implementations

# Activation Functions ## Tracker - [x] tanhshrink - [x] logsoftmax - [x] softmin - [x] dim-wise tensor sum - [x] selu - [x] logsigmoid - [x] relu6

AndreSlavescu

triaged

Community want to contribute

Generic Runtime

TensorRT-LLM
TensorRT-LLM copied to clipboard

Metadata

convert qwen2.5-VL fail

[Feat] add chunked-attention kernels on Hopper (for llama4)

[AutoDeploy] Refactor AutoDeploy torch custom op to attach `auto_deploy` prefix to the op namespace

pass rotary_emb_base to gpt_attention

Include an option when no quantization mode is needed

skip special token during inference

add cudnn_root arguments for build_wheel.py for not build TensorRT-LL…

Fix baichuan smoothquant/INT8 KV cache build error

fix: GPTBenchmark object has no attribute num_kv_heads

Activation Function Implementations

← Metadata

Owner

Metadata

TensorRT-LLM TensorRT-LLM copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT-LLM
TensorRT-LLM copied to clipboard