TensorRT 🐛 [Bug] Exporting engine with `hardware_compatible` does not create hardware compatible egine

Bug Description

from tensorrt import Logger, Runtime
from torch import randn
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torch_tensorrt import convert_method_to_trt_engine

# Create model
weights = MobileNet_V2_Weights.DEFAULT
model = mobilenet_v2(weights=weights).eval()
example_input = randn(1, 3, 224, 224)

# Create TRT engine
engine_bytes = convert_method_to_trt_engine(
    model,
    ir="dynamo",
    inputs=[example_input],
    version_compatible=True,
    hardware_compatible=True,
    require_full_compilation=True
)

# Check hardware compat
logger = Logger(Logger.WARNING)
runtime = Runtime(logger)
engine = runtime.deserialize_cuda_engine(engine_bytes)
print("Hardware compat level:", engine.hardware_compatibility_level)
# prints: Hardware compat level: HardwareCompatibilityLevel.NONE

I am running on an A100 (sm_80). I also see that the correct hardware_compatible flag is being passed to C++, from the torch-tensorrt logger:

CompilationSettings(
    enabled_precisions={<dtype.f32: 7>},
    workspace_size=1073741824,
    min_block_size=5,
    torch_executed_ops=set(),
    pass_through_build_failures=False,
    max_aux_streams=None,
    version_compatible=True,
    optimization_level=3,
    use_python_runtime=False,
    truncate_double=False,
    use_fast_partitioner=True,
    enable_experimental_decompositions=False,
    device=Device(type=DeviceType.GPU, gpu_id=0), 
    equire_full_compilation=True,
    disable_tf32=False,
    assume_dynamic_shape_support=False,
    sparse_weights=False, engine_capability=<EngineCapability.STANDARD: 1>,
    num_avg_timing_iters=1, dla_sram_size=1048576,
    dla_local_dram_size=1073741824,
    dla_global_dram_size=536870912,
    dryrun=False,
    hardware_compatible=True,
    timing_cache_path='/tmp/torch_tensorrt_engine_cache/timing_cache.bin',
    lazy_engine_init=False,
    cache_built_engines=False,
    reuse_cached_engines=False,
    use_explicit_typing=False,
    use_fp32_acc=False,
    refit_identical_engine_weights=False,
    strip_engine_weights=False,
    immutable_weights=True,
    enable_weight_streaming=False,
    enable_cross_compile_for_windows=False,
    tiling_optimization_level='none',
    l2_limit_for_tiling=-1,
    use_distributed_mode_trace=False,
    offload_module_to_cpu=False
)

Any ideas.

To Reproduce

Steps to reproduce the behavior:

Run Python script above

Expected behavior

Engine hardware compatibility shows AMPERE_PLUS.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 2.9.0
PyTorch Version (e.g. 1.0): 2.9.0
CPU Architecture: x86_64
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source):
Are you using local sources or building from archives: No
Python version: 3.12
CUDA version: 12.8
GPU models and configuration: Nvidia A100
Any other relevant information: None

Dec 02 '25 18:12 olokobayusuf

Here is the exported engine:

model.engine.zip

Dec 02 '25 18:12 olokobayusuf

I tried on machine A40 with

torch_tensorrt           2.10.0.dev0+29002ed
tensorrt                 10.13.3.9

it worked. I will try once on A100 with release 2.9 and let you know.

Dec 11 '25 22:12 apbose

@apbose thanks for the help.

Dec 11 '25 23:12 olokobayusuf

Any updates on this?

Dec 16 '25 20:12 olokobayusuf

I could repro on A100 torchTRT 2.9. Need to look into this

Dec 16 '25 23:12 apbose

That's good. Since the flag worked on torch_tensorrt==2.10.0.dev, do you have any idea when 2.10 might be released? We need a fix ASAP. Thanks for your help on this.

Dec 16 '25 23:12 olokobayusuf

yes this should work in the upcoming 2.10 release which will be in Jan 2026. Meanwhile you can use the nightly container ghcr.io/pytorch/tensorrt/torch_tensorrt:nightly

Dec 17 '25 00:12 apbose