tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

### System Info TensorRT-LLM:v0.9.0 tensorrtllm_backend:v0.9.0 ### Who can help? @kaiyux ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [...

bug

### System Info Triton + TRT-LLM 0.9.0, llama2 70b model, fp8 quantization, run on 2xH100 80GB, tp 2, pp 1 config.pbtxt for tensorrt_llm_bls (otherwise unchanged): ```txt parameters: { key: "accumulate_tokens"...

bug

### System Info rtx4090 ### Who can help? @ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...

bug

**Description** Trying to deploy a HugginFace model, which I successfully converted with TensorRT-LLM (i.e. inference with model engines works in the TRT-LLM container), in Triton Server with tensorrtllm_backend, I always...

Hi, **Problem:** This PR fix a silent bug inside the `scripts\launch_triton_server.py` module, this issue only occurs if we try to automatically launch the triton server inside a container using either...

### System Info 8*RTX4090, 24G tensorrt_llm version: 0.11.0.dev2024051400 ### Who can help? @T ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks...

bug
triaged

### System Info I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input, but...

bug
triaged

### System Info A100 160GB(2*80) ### Who can help? @byshiue @kaiyux ### Information - [X] The official example scripts - [X] My own modified scripts ### Tasks - [ ]...

bug

Hi, I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface. As an example, we could use...

### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: https://github.com/NVIDIA/TensorRT-LLM.git (https://github.com/NVIDIA/TensorRT-LLM/commit/bf0a5afc92f4b2b3191e9e55073953c1f600cf2d) https://github.com/triton-inference-server/tensorrtllm_backend.git (ae52bce3ed8ecea468a16483e0dacd3d156ae4fe) Model: zephyr-7b-beta ### Who can help? @kaiyux ### Information - [ ] The official...

bug