tensorrtllm_backend
tensorrtllm_backend copied to clipboard
The Triton TensorRT-LLM Backend
### System Info TensorRT-LLM:v0.9.0 tensorrtllm_backend:v0.9.0 ### Who can help? @kaiyux ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [...
### System Info Triton + TRT-LLM 0.9.0, llama2 70b model, fp8 quantization, run on 2xH100 80GB, tp 2, pp 1 config.pbtxt for tensorrt_llm_bls (otherwise unchanged): ```txt parameters: { key: "accumulate_tokens"...
### System Info rtx4090 ### Who can help? @ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...
**Description** Trying to deploy a HugginFace model, which I successfully converted with TensorRT-LLM (i.e. inference with model engines works in the TRT-LLM container), in Triton Server with tensorrtllm_backend, I always...
Hi, **Problem:** This PR fix a silent bug inside the `scripts\launch_triton_server.py` module, this issue only occurs if we try to automatically launch the triton server inside a container using either...
### System Info 8*RTX4090, 24G tensorrt_llm version: 0.11.0.dev2024051400 ### Who can help? @T ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks...
### System Info I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input, but...
### System Info A100 160GB(2*80) ### Who can help? @byshiue @kaiyux ### Information - [X] The official example scripts - [X] My own modified scripts ### Tasks - [ ]...
Hi, I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface. As an example, we could use...
### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: https://github.com/NVIDIA/TensorRT-LLM.git (https://github.com/NVIDIA/TensorRT-LLM/commit/bf0a5afc92f4b2b3191e9e55073953c1f600cf2d) https://github.com/triton-inference-server/tensorrtllm_backend.git (ae52bce3ed8ecea468a16483e0dacd3d156ae4fe) Model: zephyr-7b-beta ### Who can help? @kaiyux ### Information - [ ] The official...