TensorRT-LLM
TensorRT-LLM copied to clipboard
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
### System Info x86_64, 128G RTX3090 24G TensorRT-LLM 0.19.0 cuda 12.8.93 host system ubuntu 20.04 host GPU driver 550.144.03 TensorRT 10.9.0.34 cuBLAS 12.8.4.1 CONTAINER ID IMAGE ec1bbab4b4aa tensorrt_llm/release:latest ### Who...
# Add chunked-attention kernels on Hopper (for llama4) fmha_v2 commit: 6552b99d4820fa3f5e8a48a392681a8c128bf623 ## Description Please explain the issue and the solution in short. ## Test Coverage ## GitHub Bot Help `/bot...
Add `auto_deploy` namespace to uniquely identify all the custom ops defined in auto_deploy/custom_ops. This could avoid potential namespace conflicts for ops defined in the manual workflow.
1.pass rotary_emb_base to gpt attention tensorrt_llm/models/qwen/model.py 2.change variable name: rotary_base -> rotary_emb_base examples/qwen/build.py
Hi, I have added a NONE value to the QuantMode class because of the following two reasons: - 'none' is present in cpp/tensorrt_llm/common/quantization.h but not here. - by adding it,...
att. Else the outputs will contain many `` (eos token).
add cudnn_root arguments for build_wheel.py for not build TensorRT-LLM in docker image. when someone has a local environment in their docker, they don't want to create a new docker image....
The baichuan convert script lacks `scale_y_accum_quant`, `scale_w_quant_orig` value saving.
run command: ```python python benchmark.py -m bloom_560m --batch_size "1" --input_output_len "1024,20" --engine_dir /some/dir ``` Errors: ```python Traceback (most recent call last): File "/workspace/volume/wangchao2/TensorRT-LLM/benchmarks/python/benchmark.py", line 322, in main(args) File "/workspace/volume/wangchao2/TensorRT-LLM/benchmarks/python/benchmark.py", line...
# Activation Functions ## Tracker - [x] tanhshrink - [x] logsoftmax - [x] softmin - [x] dim-wise tensor sum - [x] selu - [x] logsigmoid - [x] relu6