TensorRT
TensorRT copied to clipboard
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Significant output differences when compiling and running the `facebook/bart-base` (https://huggingface.co/facebook/bart-base) model with Torch-TensorRT, even after applying FP16 and various precision settings. Compare the output using the following code: ```python import...
# Description When compiling `facebook/bart-base` with Torch-TensorRT, I encountered an error similar to the one in [this issue](https://github.com/pytorch/TensorRT/issues/3184), where `aten_ops.scatter.src` fails within `impl.elementwise.eq`. Upon investigation, I found that the issue...
This PR illustrates the use of nccl ops from TRT-LLM for the example `examples/distributed_inference/tensor_parallel_simple_example.py`
## ❓ Question Since only part of the ops support dynamic shapes, and some are not. What's the criteria to decide if an op supports dynamic shape or not? For...
# Description A graph module's output might have nested structures depending on the implementation. For example, many models from transformers returns output of type [ModelOutput](https://github.com/huggingface/transformers/blob/c409cd81777fb27aadc043ed3d8339dbc020fb3b/src/transformers/utils/generic.py#L310) (e.g. [CausalLMOutputsWithPast](https://github.com/huggingface/transformers/blob/c409cd81777fb27aadc043ed3d8339dbc020fb3b/src/transformers/modeling_outputs.py#L678)). This PR doesn't...
## Bug Description I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_triton.html First the provided script to generate optimized model does not...
# Description The cross compile for windows change has added the following new interface: **1) c++ side** added setup_engine() interface moved base64_encode/decode from register_jit_hooks.cpp to runtime.cpp since it is being...
## Bug Description When using engine cache feature on Llama2-7b, I found that reusing cached engine is pretty slow, even slower than training a non-refittable engine from scratch. I figured...
## Bug Description > require_full_compilation (bool): Require modules to be compiled end to end or return an error as opposed to returning a hybrid graph where operations that cannot be...
## Bug Description The output shape of `aten::_convolution` no longer matches pytorch after the TensorRT 10 upgrade. I have noticed that the output shape is correct when I pass in...