server icon indicating copy to clipboard operation
server copied to clipboard

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Results 779 server issues
Sort by recently updated
recently updated
newest added

**Description** The ensemble model could not retrieve the correct input from the output ![Screenshot 2024-08-08 at 3 34 30 PM](https://github.com/user-attachments/assets/e6187d8a-8363-4ffd-9efc-425fb7087235) **Triton Information** NGC docker image nvcr.io/nvidia/tritonserver:24.07-py3 **Ensemble Config** ``` #...

I am using TRTIS:22.09 in Kubernetes. I noticed that the nv_cpu_utilization value provided by TRTIS and the actual CPU usage, represented by container_cpu_usage_seconds_total, differ. First, I need an accurate explanation...

#### What does the PR do? Add client input size check to make sure input shape byte size matches input data byte size. #### Checklist - [x] PR title reflects...

PR: test

**Description** I used the latest image version 24.06 because the corresponding latest version of trt has support for BF16. But when I deploy the model with trt-backend. I used perf_analyze...

module: clients

I want to deploy MaCBert but I cannot find some helpful blogs, do you have instructions about deployment of this model?

question

**Description** bug when deploying Macbert **Triton Information** I use the official image: nvcr.io/nvidia/tritonserver:21.09-py3 ``` NVIDIA Release 21.09 (build 27443074) Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various...

good first issue

I run the triton server using the following commands S3_REPO="s3://.../models/repository/" docker run --rm --net=host --gpus=all nvcr.io/nvidia/tritonserver:23.11-py3 tritonserver --model-repository="$S3_REPO" After the server is up, I do inferencing using the IP address....

**Description** I deployed Triton Inference Server on Kubernetes (GKE). To balance the load, I created a Load Balancer Service. As a client, I'm using the Python HTTP client. I was...

**Description** - Our ensemble model running on the PyTorch backend in NVIDIA Triton Inference is experiencing performance issues with dynamic batching under high concurrency. - About the ensemble modelThe first...

I have a python backend model with the following config.pbtxt without Dynamic Batching. ``` name: "sample" backend: "python" max_batch_size: 0 input [ { name: "text" # Stringified JSON Array data_type:...