tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

### System Info P4D (A100 40 GB x 8) ### Who can help? @juney-nvidia @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts...

question

### System Info - CPU architecture: x86_64 - GPU: 1 x Nvidia A100 - Docker image for LLM serialization: nvidia/cuda:12.1.0-devel-ubuntu22.04 - Docker image for triton server launch: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 - TensorRT...

bug
triaged

case1:use tensorrtllm python3 /tensorrtllm_backend/tensorrt_llm/examples/run.py --engine_dir "/data512/tensorrtllm_backend/triton_model_repo/tensorrt_llm/1/" \ --max_output_len 2048 \ --tokenizer_dir "/tensorrtllm_backend/tokenizer" \ --input_text "system\nYou are a helpful assistant.\nuser\nWhat is the intention of the following user questions? \Can you help...

triaged

This PR fixes the typo and wrong reference link in README.md.

The link from backend metrics to TRT-LLM batch manager stats is broken, so fixing it on public facing side for user viz.

As indicated by the title, on the main branch, I used 40 threads to simultaneously send inference requests to the in-flight Triton Server, resulting in the Triton Server getting stuck....

I used Baichuan2 13B model weight only int 8 and launch a triton server on single GPU. Now I have a node has 2 GPUs and want to multiple model...

triaged
feature request

**Description** priority not working properly with tensorrtllm_backend **Triton Information** Triton: 2.43.0 tensorrtllm_backend: 0.8.0 Are you using the Triton container or did you build it yourself? container **To Reproduce** vim `tensorrtllm_backend/inflight_batcher_llm/client/end_to_end_grpc_client.py`...

### System Info - CPU architecture x86_64 - Nvidia H100 GPU - docker image `nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3` - TensorRT-LLM tag v0.9.0 - tensorrtllm_backend tag v0.9.0 - Ubuntu 22.04 ### Who can help?...

bug

### System Info - CPU architecture x86_64 - GPU NVIDIA A100 - TensorRT-LLM branch main - TensorRT-LLM commit 71d8d4d3dc655671f32535d6d2b60cab87f36e87 - ### Who can help? @juney-nvidia @kaiyux ### Information - [x]...

bug