server issues

Ensemble retrieve the input from the incorrect memory address

**Description** The ensemble model could not retrieve the correct input from the output ![Screenshot 2024-08-08 at 3 34 30 PM](https://github.com/user-attachments/assets/e6187d8a-8363-4ffd-9efc-425fb7087235) **Triton Information** NGC docker image nvcr.io/nvidia/tritonserver:24.07-py3 **Ensemble Config** ``` #...

edwardpwtsoi

Discrepancy between nv_cpu_utilization , container_cpu_usage_seconds_total

I am using TRTIS:22.09 in Kubernetes. I noticed that the nv_cpu_utilization value provided by TRTIS and the actual CPU usage, represented by container_cpu_usage_seconds_total, differ. First, I need an accurate explanation...

seokmin-kim

test: Client-side input shape/element validation

#### What does the PR do? Add client input size check to make sure input shape byte size matches input data byte size. #### Checklist - [x] PR title reflects...

yinggeh

PR: test

A fluctuating result is obtained when perf_analyze is run for a pressure test

2

**Description** I used the latest image version 24.06 because the corresponding latest version of trt has support for BF16. But when I deploy the model with trt-backend. I used perf_analyze...

LinGeLin

module: clients

how to deploy BERT

1

I want to deploy MaCBert but I cannot find some helpful blogs, do you have instructions about deployment of this model?

chenchunhui97

question

model_repository_manager.cc:1186] failed to load 'bert' version 1: Internal: failed to load model 'bert': PytorchStreamReader failed reading zip archive: failed finding central directory

1

**Description** bug when deploying Macbert **Triton Information** I use the official image: nvcr.io/nvidia/tritonserver:21.09-py3 ``` NVIDIA Release 21.09 (build 27443074) Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved. Various...

chenchunhui97

good first issue

How to fetch the s3 model repository path of a running triton server?

1

I run the triton server using the following commands S3_REPO="s3://.../models/repository/" docker run --rm --net=host --gpus=all nvcr.io/nvidia/tritonserver:23.11-py3 tritonserver --model-repository="$S3_REPO" After the server is up, I do inferencing using the IP address....

sathiyabalu89

Triton replication on Kubernetes, all traffic forwarded to the same pod

5

**Description** I deployed Triton Inference Server on Kubernetes (GKE). To balance the load, I created a Load Balancer Service. As a client, I'm using the Python HTTP client. I was...

Vincouux

Slow Dynamic Batching with High Concurrency on NVIDIA Triton Inference

**Description** - Our ensemble model running on the PyTorch backend in NVIDIA Triton Inference is experiencing performance issues with dynamic batching under high concurrency. - About the ensemble modelThe first...

tuanavu

Dynamic Batching with Python Backend

I have a python backend model with the following config.pbtxt without Dynamic Batching. ``` name: "sample" backend: "python" max_batch_size: 0 input [ { name: "text" # Stringified JSON Array data_type:...

srinivasaraov

server
server copied to clipboard

Metadata

Ensemble retrieve the input from the incorrect memory address

Discrepancy between nv_cpu_utilization , container_cpu_usage_seconds_total

test: Client-side input shape/element validation

A fluctuating result is obtained when perf_analyze is run for a pressure test

how to deploy BERT

model_repository_manager.cc:1186] failed to load 'bert' version 1: Internal: failed to load model 'bert': PytorchStreamReader failed reading zip archive: failed finding central directory

How to fetch the s3 model repository path of a running triton server?

Triton replication on Kubernetes, all traffic forwarded to the same pod

Slow Dynamic Batching with High Concurrency on NVIDIA Triton Inference

Dynamic Batching with Python Backend

← Metadata

Owner

Metadata

server server copied to clipboard

Metadata

← Metadata

Owner

Metadata

server
server copied to clipboard