TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] Encountered bug when using Torch-TensorRT

Open aviadmx opened this issue 2 years ago • 1 comments

Bug Description

EfficientNet example notebook does not compile to FP16

To Reproduce

Steps to reproduce the behavior:

Just open the EfficientNet notebook and try to run all.
It will hang forever on:

# The compiled module will have precision as specified by "op_precision".
# Here, it will have FP16 precision.
trt_model_fp16 = torch_tensorrt.compile(model, inputs = [torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.half)],
    enabled_precisions = {torch.half}, # Run with FP32
    workspace_size = 1 << 22
)
[Torch-TensorRT] - For input x.1, found user specified input dtype as Float16, however when inspecting the graph, the input type expected was inferred to be Float
The compiler is going to use the user setting Float16
This conflict may cause an error at runtime due to partial compilation being enabled and therefore
compatibility with PyTorch's data type convention is required.
If you do indeed see errors at runtime either:
- Remove the dtype spec for x.1
- Disable partial compilation by setting require_full_compilation to True

Expected behavior

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.2.0
  • PyTorch Version (e.g. 1.0): 1.12
  • CPU Architecture:
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): offical docker
  • Build command you used (if compiling from source): official docker
  • Are you using local sources or building from archives: official docker
  • Python version: 3.8
  • CUDA version: 11.3
  • GPU models and configuration:
  • Any other relevant information: Docker version nvcr.io/nvidia/pytorch:22.08-py3

aviadmx avatar Sep 22 '22 08:09 aviadmx

What happens if you enable debug messaging? Running the following script works fine on my side:

import torch
from torchvision import models
import torch_tensorrt as torchtrt

classification_arches = [
    models.efficientnet_b0,
    models.efficientnet_v2_s,
]

failures = []

for arch in classification_arches:
    model = arch()
    model = torch.jit.script(model)
    model.eval().cuda()

    try:
        print(f"Running {arch.__name__}")
        with torchtrt.logging.debug():
            mod = torchtrt.ts.compile(
                model,
                inputs=[torchtrt.Input((1, 3, 300, 300))],
                enabled_precisions={torch.float, torch.half},
                truncate_long_and_double=True,
            )

            x = torch.randn((1, 3, 300, 300)).cuda()

            mod(x)
    except:
        failures.append(arch.__name__)

print(f"Classification Failures: {failures}")

narendasan avatar Sep 22 '22 19:09 narendasan

The notebook, when exported to .py, works. Seems like another jupyter notebook specific issue. @narendasan do you have a clue about why this would happen?

@aviadmx The content of the notebook would work fine if you run it in a .py file. Use jupyter nbconvert --to script EfficientNet-example.ipynb to convert the notebook.

tanayvarshney avatar Sep 26 '22 16:09 tanayvarshney

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Dec 26 '22 00:12 github-actions[bot]