server BERT model is returning NaN logits values in output

Description I'm able to deploy fine-tuned "bert-base-uncased" model on Triton inference server using TensorRT, while inference I am getting a NaN logits values.

Converted the onnx model to tensorrt using command below. trtexec --onnx=model.onnx --saveEngine=model.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt

Output logs logits: [[[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan] ............ [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]]]

Triton Information Triton Server Version 2.22.0 NVIDIA Release 22.05

Using Triton container: '007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-tritonserver:22.05-py3'

To Reproduce

Deploy tensorrt model on triton inference server.
Send a inference request.

text = "Published by HT Digital Content Services with permission from Construction Digital." batch_size = 1 payload = { "inputs": [ { "name": "TEXT", "shape": (batch_size,), "datatype": "BYTES", "data": [text], } ] }

Preprocessed the input text, got the input_ids and attention_masks from tokenizer then send the below input to the model.

Model input:

{'input_ids': array([[ 101, 12414, 10151, 87651, 10764, 18491, 12238, 10171, 48822, 10195, 13154, 10764, 119, 102]], dtype=int32), 'token_type_ids': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)}

Then you will see that model produce the logits values as NaN.

Please find the all deployment files on G-drive - https://drive.google.com/file/d/1uteEOgnSLwtfTonJtgukKjnDwycezFg3/view?usp=sharing

Expected behavior I expect valid logits values from the BERT model instead of NaN.

Please help me on this issue. Thanks

Sep 02 '22 17:09 vinayak-shanawad

Hi @alxmamaev @dyastremsky @Tabrizian @rmccorm4

Hope you are doing well. Could anyone please address this issue?

Thanks in advance.

Sep 08 '22 05:09 vinayak-shanawad

Hi @Vinayaks117, sorry about the late reply. I've been investigating this and I'm trying to reproduce the issue. I'll comment here as soon as I have some update.

Sep 08 '22 16:09 krishung5

No worries @krishung5 Thanks for the confirmation.

Sep 08 '22 16:09 vinayak-shanawad

@Vinayaks117 Could you run the model using Polygraphy to confirm that the model works outside Triton? This can help us narrow down if the issue occurs inside Triton.

Sep 08 '22 22:09 krishung5

@krishung5 It will take some time for me to run the model on Polygraphy because I didn't use that earlier.

FYI: I was able to deploy onnx model on Triton and able to get the inference with same code which I shared earlier. Thanks

Sep 09 '22 16:09 vinayak-shanawad

It will take some time for me to run the model on Polygraphy because I didn't use that earlier.

@Vinayaks117 Hopefully something like this should get you started (assuming you're in the same directory as your ONNX model, otherwise change mount paths)

# Start latest TRT container that comes with polygraphy installed
docker run -ti --gpus all -v ${PWD}:/mnt -w /mnt nvcr.io/nvidia/tensorrt:22.08-py3

# Let polygraphy install dependencies as needed (onnxruntime, etc)
export POLYGRAPHY_AUTOINSTALL_DEPS=1

# Run model with both onnxruntime, and tensorrt, and then compare the outputs
polygraphy run --validate --onnxrt --trt model.onnx

# For more details, config options, dynamic shape settings, etc.
polygraphy -h

# For example to validate that your TRT model is returning NaNs or not, you might try
polygraphy run --trt <trt plan or onnx file> --validate

Sep 09 '22 18:09 rmccorm4

Thanks @rmccorm4 for sharing the commands. Please find the results below.

onnx model - polygraphy validation is passed. Results: onnx-model-validation-results.txt
Converted the onnx model to trt model with minshapes: 1x1 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt

Results: ort-model-minshape-1x1-validation-results.txt

Converted the onnx model to trt model with minshapes: 1x128 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt

Results: ort-model-minshape-1x128-validation-results.txt

Please find the model.onnx file on G-drive

Please let me know if you need any additional details. Thanks

Sep 10 '22 14:09 vinayak-shanawad

Hi @Vinayaks117 ,

For #3 I see validation failed and you get the same all nan outputs, which likely means there's an issue with your model, and this is not related to Triton.

You should further investigate your model to verify why it's not working as expected. Polygraphy may be able to help there as well to see which layer in the model the nans first start to appear.

You may want to open an issue with TRT team if you need further help. https://github.com/NVIDIA/TensorRT

Sep 10 '22 15:09 rmccorm4

Closing due to inactivity. Please let us know to reopen the issue if you'd like to follow up.

Sep 30 '22 22:09 dyastremsky

server server copied to clipboard

BERT model is returning NaN logits values in output

server
server copied to clipboard