tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

there are two `gen_random_start_ids` in tools/utils/utils.py https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L238-L248 https://github.com/triton-inference-server/tensorrtllm_backend/blob/ae52bce3ed8ecea468a16483e0dacd3d156ae4fe/tools/utils/utils.py#L270-L280

triaged

### System Info ### Environment CPU architecture: x86_64 CPU/Host memory size: 440 GiB memory ### GPU properties GPU name: A100 GPU memory size: 160GB I am using the Azure offering...

bug
triaged

### Question The codes in [launch_triton_server.py](https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/scripts/launch_triton_server.py): ``` def get_cmd(world_size, tritonserver, grpc_port, http_port, metrics_port, model_repo, log, log_file, tensorrt_llm_model_name): cmd = ['mpirun', '--allow-run-as-root'] for i in range(world_size): cmd += ['-n', '1', tritonserver,...

question
triaged

### System Info - DGX-A100 - Triton Image : v0.7.2 ### Who can help? @kaiyux _No response_ ### Information - [X] The official example scripts - [ ] My own...

bug

The use of `Popen `creates a non-blocking subprocess. When launching Docker containers in detached mode (`-d` flag) this causes the container to start and then immediately stop, since the main...

triaged

Is warmup supported for the `tensorrtllm_backend`? If so it would be nice to have an example of how to upload LoRa adapters as a warmup step.

triaged

change TensorRT-LLM model input specification link reference address.

I would like to deploy qwen-vl using Triton. Do you have any example repositories that are compatible with qwen-vl?

I have a bert model that I am trying to deploy with Triton Inference Server using Tensorrt-LLM backend. But I am getting errors: ? Docker Image: 24.03 ? TensorRT-LLM: v0.8.0...

triaged

Gpt Manager does not support the expansion of input & output parameters, resulting in the inability to add tensorrtllm inference parameters. Will it be supported in the future? Like This:...

feature request