TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] torchtrt.dynamo.compile produces nan values

Open cehongwang opened this issue 5 months ago • 1 comments

Bug Description

torchtrt.dynamo.compile is broken. Engines produces nan values

To Reproduce

Steps to reproduce the behavior:

  1. pytest /home/TensorRT/tests/py/dynamo/models/test_models.py -k test_resnet18

FAILED tests/py/dynamo/models/test_models.py::test_resnet18 - AssertionError: False is not true : Resnet18 TRT outputs don't match with the original model. Cosine sim score: nan Threshold: 0.99 FAILED tests/py/dynamo/models/test_models.py::test_resnet18_half - AssertionError: False is not true : Resnet18 Half TRT outputs don't match with the original model. Cosine sim score: nan Threshold: 0.99

Expected behavior

All tests pass

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0):
  • PyTorch Version (e.g. 1.0):
  • CPU Architecture:
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, libtorch, source):
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version:
  • CUDA version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

cehongwang avatar Jun 16 '25 19:06 cehongwang

Yes have been seeing this as well in multiple tests

narendasan avatar Jun 16 '25 20:06 narendasan

hey @cehongwang @narendasan was this ever resolved? what was the issue? im seeing nan values as well for my network. thanks!

patrick-botco avatar Jul 15 '25 22:07 patrick-botco

Yes this issue has been resolved in main

narendasan avatar Jul 16 '25 17:07 narendasan

@narendasan can you point me to what resolved the issue? im still seeing this for my network after taking changes from HEAD of main

patrick-botco avatar Jul 16 '25 20:07 patrick-botco

This was due to some use after free issues related to NVFP4 support https://github.com/pytorch/TensorRT/pull/3573

narendasan avatar Jul 16 '25 22:07 narendasan