server
server copied to clipboard
BERT model is returning NaN logits values in output
Description I'm able to deploy fine-tuned "bert-base-uncased" model on Triton inference server using TensorRT, while inference I am getting a NaN logits values.
Converted the onnx model to tensorrt using command below. trtexec --onnx=model.onnx --saveEngine=model.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt
Output logs
logits: [[[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan] ............ [nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]]]
Triton Information Triton Server Version 2.22.0 NVIDIA Release 22.05
Using Triton container: '007439368137.dkr.ecr.us-east-2.amazonaws.com/sagemaker-tritonserver:22.05-py3'
To Reproduce
- Deploy tensorrt model on triton inference server.
- Send a inference request.
text = "Published by HT Digital Content Services with permission from Construction Digital."
batch_size = 1
payload = { "inputs": [ { "name": "TEXT", "shape": (batch_size,), "datatype": "BYTES", "data": [text], } ] }
Preprocessed the input text, got the input_ids and attention_masks from tokenizer then send the below input to the model.
Model input:
{'input_ids': array([[ 101, 12414, 10151, 87651, 10764, 18491, 12238, 10171, 48822, 10195, 13154, 10764, 119, 102]], dtype=int32), 'token_type_ids': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32), 'attention_mask': array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)}
Then you will see that model produce the logits values as NaN.
Please find the all deployment files on G-drive - https://drive.google.com/file/d/1uteEOgnSLwtfTonJtgukKjnDwycezFg3/view?usp=sharing
Expected behavior I expect valid logits values from the BERT model instead of NaN.
Please help me on this issue. Thanks
Hi @alxmamaev @dyastremsky @Tabrizian @rmccorm4
Hope you are doing well. Could anyone please address this issue?
Thanks in advance.
Hi @Vinayaks117, sorry about the late reply. I've been investigating this and I'm trying to reproduce the issue. I'll comment here as soon as I have some update.
No worries @krishung5 Thanks for the confirmation.
@Vinayaks117 Could you run the model using Polygraphy to confirm that the model works outside Triton? This can help us narrow down if the issue occurs inside Triton.
@krishung5 It will take some time for me to run the model on Polygraphy because I didn't use that earlier.
FYI: I was able to deploy onnx model on Triton and able to get the inference with same code which I shared earlier. Thanks
It will take some time for me to run the model on Polygraphy because I didn't use that earlier.
@Vinayaks117 Hopefully something like this should get you started (assuming you're in the same directory as your ONNX model, otherwise change mount paths)
# Start latest TRT container that comes with polygraphy installed
docker run -ti --gpus all -v ${PWD}:/mnt -w /mnt nvcr.io/nvidia/tensorrt:22.08-py3
# Let polygraphy install dependencies as needed (onnxruntime, etc)
export POLYGRAPHY_AUTOINSTALL_DEPS=1
# Run model with both onnxruntime, and tensorrt, and then compare the outputs
polygraphy run --validate --onnxrt --trt model.onnx
# For more details, config options, dynamic shape settings, etc.
polygraphy -h
# For example to validate that your TRT model is returning NaNs or not, you might try
polygraphy run --trt <trt plan or onnx file> --validate
Thanks @rmccorm4 for sharing the commands. Please find the results below.
-
onnx model - polygraphy validation is passed. Results: onnx-model-validation-results.txt
-
Converted the onnx model to trt model with minshapes: 1x1 using below command and its polygraphy validation is failed.
trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt
Results: ort-model-minshape-1x1-validation-results.txt
- Converted the onnx model to trt model with minshapes: 1x128 using below command and its polygraphy validation is failed.
trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt
Results: ort-model-minshape-1x128-validation-results.txt
Please find the model.onnx file on G-drive
Please let me know if you need any additional details. Thanks
Hi @Vinayaks117 ,
For #3 I see validation failed and you get the same all nan
outputs, which likely means there's an issue with your model, and this is not related to Triton.
You should further investigate your model to verify why it's not working as expected. Polygraphy may be able to help there as well to see which layer in the model the nan
s first start to appear.
You may want to open an issue with TRT team if you need further help. https://github.com/NVIDIA/TensorRT
Closing due to inactivity. Please let us know to reopen the issue if you'd like to follow up.