TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...

Results 937 TensorRT-LLM issues
Sort by recently updated
recently updated
newest added

I want to test an example: the initial kv cache length is 2048, and LLM iterate 2048 times, so the output_tokens=2048, but the initial kv cache length is 2048, and...

question
triaged

Summary I would like to propose the addition of constrained decoding support. This feature would allow the output sequence to be constrained by a Finite State Machine (FSM) or Context-Free...

triaged
feature request

I use following code to generate the checkpoint: ``` set -e export MODEL_DIR=/mnt/memory export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1 export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH export PATH=/usr/local/tensorrt/bin:$PATH export PRECISION=int4_gptq_a16 export QUANTIZE=int4_gptq export DTYPE=bfloat16 export PYTHONPATH=/app/tensorrt-llm:$PYTHONPATH python ../llama/convert_checkpoint.py \...

triaged

### System Info ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700 ### Who can help? @Tracin ### Information - [X] The official example scripts - [...

triaged
feature request
quantization
not a bug

### System Info - A100 40G - tensorrt 10.0.1 - tensorrt-llm 0.10.0.dev2024050700 ### Who can help? @Tracin ### Information - [X] The official example scripts - [ ] My own...

bug
triaged

1. Build mixtral for tp8 2. Run `mpirun -n 8 ./gptSessionBenchmark` 3. nvidia-smi shows ``` Wed May 15 09:13:31 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4...

triaged

## Environment - RTX8000 GPU - TensorRT-LLM v0.9.0 ## Model - LLaVA v1.5 7B (LLaMA2 7B) - fp16 and int8/int4 weight quantization - batchsize = 16 ## Script - official...

question
triaged

### System Info tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700 ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My...

triaged

### System Info GPU: A800 GPU memory: 80G TensorRT-LLM: 0.8.0 CUDA: 12.1 OS: unbuntu ### Who can help? @byshiue @kaiyux ### Information - [ ] The official example scripts -...

bug

### System Info - CPU architecture: x86_64 - GPU name: NVIDIA A40, 46GB - TensorRT-LLM: v0.9.0 - Os: Ubuntu 20.04 - Nvidia Driver: 535.54.03, Cuda: 12.2 ### Who can help?...

bug
triaged