tensorrtllm_backend issues

two seemingly identical functions in the same file

1

there are two `gen_random_start_ids` in tools/utils/utils.py https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L238-L248 https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L270-L280

dongluw

triaged

Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception

2

### System Info ### Environment CPU architecture: x86_64 CPU/Host memory size: 440 GiB memory ### GPU properties GPU name: A100 GPU memory size: 160GB I am using the Azure offering...

kelkarn

bug

triaged

[tensorrt-llm backend] A question about launch_triton_server.py

7

### Question The codes in [launch_triton_server.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/scripts/launch_triton_server.py): ``` def get_cmd(world_size, tritonserver, grpc_port, http_port, metrics_port, model_repo, log, log_file, tensorrt_llm_model_name): cmd = ['mpirun', '--allow-run-as-root'] for i in range(world_size): cmd += ['-n', '1', tritonserver,...

victorsoda

question

triaged

Batching not working : QPS remains same on increasing batch size

6

### System Info - DGX-A100 - Triton Image : v0.7.2 ### Who can help? @kaiyux _No response_ ### Information - [X] The official example scripts - [ ] My own...

RahulnKumar

bug

Replace subprocess.Popen with subprocess.run

The use of `Popen `creates a non-blocking subprocess. When launching Docker containers in detached mode (`-d` flag) this causes the container to start and then immediately stop, since the main...

rlempka

triaged

Warmup Example of loading LoRa weights

6

Is warmup supported for the `tensorrtllm_backend`? If so it would be nice to have an example of how to upload LoRa adapters as a warmup step.

TheCodeWrangler

triaged

FIX link reference in README.md

change TensorRT-LLM model input specification link reference address.

sunjiabin17

How to deploy qwen-vl using tensorrtllm_backend?

3

I would like to deploy qwen-vl using Triton. Do you have any example repositories that are compatible with qwen-vl?

mouweng

Deployement failed for BERT

1

I have a bert model that I am trying to deploy with Triton Inference Server using Tensorrt-LLM backend. But I am getting errors: ? Docker Image: 24.03 ? TensorRT-LLM: v0.8.0...

vivekjoshi556

triaged

GptManager’s scalability issues with input & output parameters

1

Gpt Manager does not support the expansion of input & output parameters, resulting in the inability to add tensorrtllm inference parameters. Will it be supported in the future? Like This：...

service-kit

feature request

tensorrtllm_backend
tensorrtllm_backend copied to clipboard

Metadata

two seemingly identical functions in the same file

Deploying Mixtral-8x7B-v0.1 with Triton 24.02 on A100 (160GB) raises "Cuda Runtime (out of memory)" exception

[tensorrt-llm backend] A question about launch_triton_server.py

Batching not working : QPS remains same on increasing batch size

Replace subprocess.Popen with subprocess.run

Warmup Example of loading LoRa weights

FIX link reference in README.md

How to deploy qwen-vl using tensorrtllm_backend?

Deployement failed for BERT

GptManager’s scalability issues with input & output parameters

← Metadata

Owner

Metadata

tensorrtllm_backend tensorrtllm_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

tensorrtllm_backend
tensorrtllm_backend copied to clipboard