TensorRT issues

Large output differences with facebook/bart-base model

1

Significant output differences when compiling and running the `facebook/bart-base` (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings. Compare the output using the following code: ```python import...

chohk88

fix: expand dim for scalar numpy when freezing tensors to IConstantLayers

# Description When compiling `facebook/bart-base` with Torch-TensorRT, I encountered an error similar to the one in [this issue](https://github.com/pytorch/TensorRT/issues/3184), where `aten_ops.scatter.src` fails within `impl.elementwise.eq`. Upon investigation, I found that the issue...

chohk88

component: conversion

component: api [Python]

cla signed

component: dynamo

using nccl ops from TRT-LLM namespace

This PR illustrates the use of nccl ops from TRT-LLM for the example `examples/distributed_inference/tensor_parallel_simple_example.py`

apbose

component: lowering

component: api [Python]

cla signed

component: dynamo

❓ [Question] How to decide if an Op should support dynamic shape or not

15

## ❓ Question Since only part of the ops support dynamic shapes, and some are not. What's the criteria to decide if an op supports dynamic shape or not? For...

sean-xiang-applovin

question

fix: allow `dict` type `module_outputs` in `infer_module_output_dtypes`

5

# Description A graph module's output might have nested structures depending on the implementation. For example, many models from transformers returns output of type [ModelOutput](https://github.com/huggingface/transformers/blob/c409cd81777fb27aadc043ed3d8339dbc020fb3b/src/transformers/utils/generic.py#L310) (e.g. [CausalLMOutputsWithPast](https://github.com/huggingface/transformers/blob/c409cd81777fb27aadc043ed3d8339dbc020fb3b/src/transformers/modeling_outputs.py#L678)). This PR doesn't...

jiwoong-choi

component: conversion

component: api [Python]

cla signed

component: dynamo

🐛 [Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton

7

## Bug Description I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html First the provided script to generate optimized model does not...

zmy1116

bug

cross compile for windows

2

# Description The cross compile for windows change has added the following new interface: **1) c++ side** added setup_engine() interface moved base64_encode/decode from register_jit_hooks.cpp to runtime.cpp since it is being...

lanluo-nvidia

component: tests

component: conversion

component: core

component: api [Python]

component: runtime

cla signed

component: dynamo

🐛 [Bug] `FxGraphCachePickler.get_hash(new_gm)` takes up a large portion of the total compile time, which makes reusing cached engine slow

2

## Bug Description When using engine cache feature on Llama2-7b, I found that reusing cached engine is pretty slow, even slower than training a non-refittable engine from scratch. I figured...

zewenli98

bug

🐛 [Bug] require_full_compilation=True has no effect

## Bug Description > require_full_compilation (bool): Require modules to be compiled end to end or return an error as opposed to returning a hybrid graph where operations that cannot be...

braindevices

bug

🐛 [Bug] aten::_convolution output shape no longer matches Pytorch after TensorRT 10 upgrade

4

## Bug Description The output shape of `aten::_convolution` no longer matches pytorch after the TensorRT 10 upgrade. I have noticed that the output shape is correct when I pass in...

matthewfl

bug

TensorRT
TensorRT copied to clipboard

Metadata

Large output differences with facebook/bart-base model

fix: expand dim for scalar numpy when freezing tensors to IConstantLayers

using nccl ops from TRT-LLM namespace

❓ [Question] How to decide if an Op should support dynamic shape or not

fix: allow `dict` type `module_outputs` in `infer_module_output_dtypes`

🐛 [Bug] Error when serving Torch-TensorRT JIT model to Nvidia-Triton

cross compile for windows

🐛 [Bug] `FxGraphCachePickler.get_hash(new_gm)` takes up a large portion of the total compile time, which makes reusing cached engine slow

🐛 [Bug] require_full_compilation=True has no effect

🐛 [Bug] aten::_convolution output shape no longer matches Pytorch after TensorRT 10 upgrade

← Metadata

Owner

Metadata

TensorRT TensorRT copied to clipboard

Metadata

← Metadata

Owner

Metadata

TensorRT
TensorRT copied to clipboard