tensorrtllm_backend issues

v0.9.0 tensorrt_llm_bls model return error: Model '${tensorrt_llm_model_name}' is not ready.

1

### System Info TensorRT-LLM：v0.9.0 tensorrtllm_backend：v0.9.0 ### Who can help? @kaiyux ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [...

plt12138

bug

tensorrt_llm_bls disregards top_k / temperature setting

1

### System Info Triton + TRT-LLM 0.9.0, llama2 70b model, fp8 quantization, run on 2xH100 80GB, tp 2, pp 1 config.pbtxt for tensorrt_llm_bls (otherwise unchanged): ```txt parameters: { key: "accumulate_tokens"...

janpetrov

bug

Other backends are missing

### System Info rtx4090 ### Who can help? @ ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ]...

Godlovecui

bug

CUDA runtime error in cudaDeviceGetDefaultMemPool

10

**Description** Trying to deploy a HugginFace model, which I successfully converted with TensorRT-LLM (i.e. inference with model engines works in the TRT-LLM container), in Triton Server with tensorrtllm_backend, I always...

tobernat

[Bugfix] Launch Triton server without waiting for a signal

Hi, **Problem:** This PR fix a silent bug inside the `scripts\launch_triton_server.py` module, this issue only occurs if we try to automatically launch the triton server inside a container using either...

michaelnny

unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'name' not found

4

### System Info 8*RTX4090, 24G tensorrt_llm version: 0.11.0.dev2024051400 ### Who can help? @T ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks...

Godlovecui

bug

triaged

`random_seed` seems to be ignored (or at least inconsistent) for inflight_batcher_llm

4

### System Info I've converted Llama 3 using TensorRT-LLM's convert_checkpoint script, and am serving it with the inflight_batcher_llm template. I'm trying to get diverse samples for a fixed input, but...

dyoshida-continua

bug

triaged

Mixtral 8x7-v0.1 Hangs after serving a few requests

1

### System Info A100 160GB(2*80) ### Who can help? @byshiue @kaiyux ### Information - [X] The official example scripts - [X] My own modified scripts ### Tasks - [ ]...

aaditya-srivathsan

bug

[request] Add example of custom LLM model not based on huggingface

Hi, I'm wondering if it's possible to add example (or general guideline) of how to serving custom LLM model that's not based on huggingface. As an example, we could use...

michaelnny

[Bug] Zero temperature curl request affects non-zero temperature requests

### System Info GPU: NVIDIA A100 Driver Version: 545.23.08 CUDA: 12.3 versions: https://github.com/NVIDIA/TensorRT-LLM.git (https://github.com/NVIDIA/TensorRT-LLM/commit/bf0a5afc92f4b2b3191e9e55073953c1f600cf2d) https://github.com/triton-inference-server/tensorrtllm_backend.git (ae52bce3ed8ecea468a16483e0dacd3d156ae4fe) Model: zephyr-7b-beta ### Who can help? @kaiyux ### Information - [ ] The official...

Hao-YunDeng

bug

tensorrtllm_backend
tensorrtllm_backend copied to clipboard

Metadata

v0.9.0 tensorrt_llm_bls model return error: Model '${tensorrt_llm_model_name}' is not ready.

tensorrt_llm_bls disregards top_k / temperature setting

Other backends are missing

CUDA runtime error in cudaDeviceGetDefaultMemPool

[Bugfix] Launch Triton server without waiting for a signal

unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'name' not found

`random_seed` seems to be ignored (or at least inconsistent) for inflight_batcher_llm

Mixtral 8x7-v0.1 Hangs after serving a few requests

[request] Add example of custom LLM model not based on huggingface

[Bug] Zero temperature curl request affects non-zero temperature requests

← Metadata

Owner

Metadata

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

tensorrtllm_backend
tensorrtllm_backend copied to clipboard