tensorrtllm_backend
tensorrtllm_backend copied to clipboard
The Triton TensorRT-LLM Backend
Hi, My app requires streaming since I wan to stop the generation once a certain (complicated) condition is met. My decoding method is beam_search with beam_width=2, using greedy decoding or...
Function Decomposition: The argument parsing logic was moved to a separate function parse_args() to improve readability and maintainability. This function encapsulates the logic related to parsing command-line arguments. Input Validation:...
### System Info - 8*A800 80G ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X]...
### System Info nvidia-rtx-a100 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...
I would like to send Lora weights through to a compiled tensor rt llm model but am unsure how to load the .bin weights and pass them to Triton. An...
### System Info - CPU architecture: x86_64 - CPU/Host memory size: 1T - GPU name: NVIDIA A100-40G - TensorRT-LLM branch: main, v0.9.0, 118b3d7 - CUDA: 12.3 - NVIDIA driver: 545.23.08...
https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.8.0 The README for this Triton server version has many references to the `23.10` version of Triton, which I believe based on the [support matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/), does **not** support v0.8.0. v0.8.0...
Hi Thank you for the great work you're doing on TensorRT-LLM and the Triton backend. I have some questions on matching versions between the tensorrt-llm python package, the backend, and...
In TensorRT-LLM, it is possible to integrate a LogitsProcessor during model inference to control the behavior of the inference process. Is it feasible to add a similar interface in the...
### System Info I have searched the repo here and the main server repo but don't see any information on either a) support for Safetensors (many models are saved that...