Ryan McCormick comments

Results 146 comments of


                                            Ryan McCormick

trafficstars

infer_tensorrt_imagenet.py: Not Predicting The Correct Classes

Hi @Hassan313 , Sorry for the delay. Few questions come to mind: 1. Are the inference results correct on fp32/fp16 engines? If yes, then probably an int8 calibration issue. If...

feat: Fix generation, update go structure, add go module

CC @dyastremsky since @jbkyang-nvi is on vacation

Also build source distribution for Python client

CC @szalpal

Shared memory failing in gunicorn following example

Hi @lminer , Please share the full error output/log you're getting for this issue. Also, please share the version of Triton you're using, GPU type, and other [issue template](https://github.com/triton-inference-server/server/blob/main/.github/ISSUE_TEMPLATE/bug_report.md) information....

Shared memory failing in gunicorn following example

Ah, I misread as CUDA shared memory. Can you try to isolate the error to the specific lines it is failing at and capture the traceback/exception being raised if any?...

Possible GPU memory leak in Triton. Not draining.

Hi @nrepesh, Re: Tensorflow backend, it is a known limitation that TensorFlow does not release any memory it allocates until the backend is completely unloaded. There is a FAQ on...

Possible GPU memory leak in Triton. Not draining.

Hi @Leelaobai , Is this a memory leak over time as new requests come in? Or do you just not have enough GPU memory for having both models loaded and...

Incomprehensible overhead in Tritonserver inference

CC @tanmayv25 @Tabrizian

Incomprehensible overhead in Tritonserver inference

Hi @jhm0104666 , Regarding this point: > The MLPerf inference result (v2.0) from NVIDIA shows that a single A100 with Triton, TensorRT gets ~20k resnet50 performance in the "server scenario"....

BERT model is returning NaN logits values in output

> It will take some time for me to run the model on Polygraphy because I didn't use that earlier. @Vinayaks117 Hopefully something like this should get you started (assuming...