tensorrtllm_backend icon indicating copy to clipboard operation
tensorrtllm_backend copied to clipboard

The Triton TensorRT-LLM Backend

Results 251 tensorrtllm_backend issues
Sort by recently updated
recently updated
newest added

### System Info - Ubuntu - GPU A100 / 3090 RTX - docker nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 - Python tensorrt-llm package (version 0.9.0.dev2024030500) installed in the docker image (no other installation) ### Who...

bug
triaged

### System Info **Hardware:** - CPU architecture: x86_64 - CPU memory size: - L1d cache: 2 MiB - L1i cache: 2 MiB - L2 cache: 64 MiB - L3 cache:...

bug
triaged

### System Info While building TensorRT engines for Mixtral model Mixtral-8x7B-Instruct-v0.1, ran into this error. Loading checkpoint shards: 21%|██████████████████████████████████▌ | 4/19 [05:30

bug
triaged

### System Info V100*2 nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 tensorrt-llm 0.7.0 ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks...

bug

Once I have correcly deployed my model on a triton server, once I try to send a request: `curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 100,...

### System Info - GPU: 2 x Nvidia A100 80GB ![image (4)](https://github.com/triton-inference-server/tensorrtllm_backend/assets/142644506/ef227a48-4094-4df5-8daf-2917b6bf6627) ### Who can help? _No response_ ### Information - [X] The official example scripts - [ ] My...

bug
triaged

### System Info - CPU architecture: `x86_64` - GPU: NVIDIA A10 24GB - TensorRT-LLM: `v0.8.0` (docker build via `make -C docker release_build CUDA_ARCHS="86-real"`) - Triton Inference Server: `r24.02` (docker from...

bug

### System Info ec2 instance - g5.12xlarge ami - ami-0d8667b0f72471655 ### Who can help? Hi, I'm writing to ask about a discrepancy I'm seeing when trying to run mistral-7b on...

bug
triaged

https://github.com/triton-inference-server/tensorrtllm_backend/blob/49def341ca37e0db3dc8c80c99da824107a7a938/README.md?plain=1#L231 Make text consistent for boolean variable in README. Likely should be `true` not `True` ``` Optional (default=false). Controls streaming. Decoupled mode must be set to True if using the...

documentation
triaged

https://github.com/triton-inference-server/tensorrtllm_backend/blob/49def341ca37e0db3dc8c80c99da824107a7a938/all_models/inflight_batcher_llm/preprocessing/config.pbtxt#L127 tokenizer_type parameter is missing in the config.pbtxt yet is described in the README as a parameter to use. Please add the tokenizer_type in the relevant config.pbtxt files by default.

documentation
triaged