TensorRT
TensorRT copied to clipboard
🐛 [Bug] torchtrt.dynamo.compile produces nan values
Bug Description
torchtrt.dynamo.compile is broken. Engines produces nan values
To Reproduce
Steps to reproduce the behavior:
- pytest /home/TensorRT/tests/py/dynamo/models/test_models.py -k test_resnet18
FAILED tests/py/dynamo/models/test_models.py::test_resnet18 - AssertionError: False is not true : Resnet18 TRT outputs don't match with the original model. Cosine sim score: nan Threshold: 0.99 FAILED tests/py/dynamo/models/test_models.py::test_resnet18_half - AssertionError: False is not true : Resnet18 Half TRT outputs don't match with the original model. Cosine sim score: nan Threshold: 0.99
Expected behavior
All tests pass
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0):
- PyTorch Version (e.g. 1.0):
- CPU Architecture:
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip,libtorch, source): - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:
- GPU models and configuration:
- Any other relevant information:
Additional context
Yes have been seeing this as well in multiple tests
hey @cehongwang @narendasan was this ever resolved? what was the issue? im seeing nan values as well for my network. thanks!
Yes this issue has been resolved in main
@narendasan can you point me to what resolved the issue? im still seeing this for my network after taking changes from HEAD of main
This was due to some use after free issues related to NVFP4 support https://github.com/pytorch/TensorRT/pull/3573