tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

Hi, My app requires streaming since I wan to stop the generation once a certain (complicated) condition is met. My decoding method is beam_search with beam_width=2, using greedy decoding or...

feature request

Function Decomposition: The argument parsing logic was moved to a separate function parse_args() to improve readability and maintainability. This function encapsulates the logic related to parsing command-line arguments. Input Validation:...

### System Info - 8*A800 80G ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X]...

bug

### System Info nvidia-rtx-a100 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...

bug

I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An...

triaged

### System Info - CPU architecture: x86_64 - CPU/Host memory size: 1T - GPU name: NVIDIA A100-40G - TensorRT-LLM branch: main, v0.9.0, 118b3d7 - CUDA: 12.3 - NVIDIA driver: 545.23.08...

bug

https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.8.0 The README for this Triton server version has many references to the `23.10` version of Triton, which I believe based on the [support matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/), does **not** support v0.8.0. v0.8.0...

documentation

Hi Thank you for the great work you're doing on TensorRT-LLM and the Triton backend. I have some questions on matching versions between the tensorrt-llm python package, the backend, and...

help wanted

In TensorRT-LLM, it is possible to integrate a LogitsProcessor during model inference to control the behavior of the inference process. Is it feasible to add a similar interface in the...

triaged

### System Info I have searched the repo here and the main server repo but don't see any information on either a) support for Safetensors (many models are saved that...

question
triaged