pytorch-lightning Integrate TorchTensorRt in order to increase speed during inference

🚀 Feature

Add a method like to_torchscript in lightgning.py that allow to convert a model in TorchTensorRT in order to increase performance

Motivation

Increase performance during inference

Proposal

    @torch.no_grad()
    def to_torch_tensorrt(
            self,
            example_inputs: Optional[Any] = None,
            enabled_precisions: Union ( torch.dtype , torch_tensorrt.dtype
            **kwargs,
    ) -> Union[ScriptModule, Dict[str, ScriptModule]]:
        mode = self.training

        # if no example inputs are provided, try to see if model has example_input_array set
        if example_inputs is None:
            if self.example_input_array is None:
                raise ValueError(
                    "Choosing method=`trace` requires either `example_inputs`"
                    " or `model.example_input_array` to be defined."
                )
            example_inputs = self.example_input_array

        # automatically send example inputs to the right device and use trace
        example_inputs = self._apply_batch_transfer_handler(example_inputs)
        trt_module = torch_tensorrt.compile(self.eval(),
                                           inputs=example_inputs,
                                           enabled_precisions=enabled_precision  # Run with FP16
                                           )
        self.train(mode)

        return trt_module

Additional context

A possible problem could be the dependencies because it depends on CUDA, cuDNN and TensorRT as you can see https://nvidia.github.io/Torch-TensorRT/v1.0.0/tutorials/installation.html and some of these dependencies I think work only on Linux

cc @borda @carmocca @awaelchli @ninginthecloud @daniellepintz @rohitgr7

Jan 12 '22 13:01 Actis92

Recently PyTorch team integrated Torch-TensorRT into the Pytorch ecosystem. blog post Any tips on how would one implement an export_trt to Trainer?

Jun 22 '22 09:06 luca-medeiros

We could follow the pattern used by to_onnx: https://github.com/Lightning-AI/lightning/blob/0ca3b5aa1b16667cc2d006c3833f4953b5706e72/src/pytorch_lightning/core/module.py#L1798. Comparing it to the snippet in your linked blogpost, the advantage would be to automatically use self.example_input_array (if defined) and call the batch transfer hooks to apply any transformations (if defined). This is what the top post also suggests.

Aug 19 '22 23:08 carmocca

@rohitgr7 Hi is there any progress on this? I want to do super fast GPU inference with my model trained in PyTorch Lightning. How do we convert it to TRT and will it speedup inference 2x or 4x? Thanks, Sam

Jan 25 '23 01:01 davodogster

@davodogster would you be interested to take it over and implementing it? :rabbit:

Jan 25 '23 06:01 Borda

Hi @Borda ! Sorry, I am an applied data scientist and not a good developer so it may be a challenge for me.

Do you thinks it's easily possible for me to convert my lightning model (image segmentation, batch size >=8) to RensorRT for 3-5x speedup for inference?

Jan 26 '23 00:01 davodogster

Jan 27 '23 00:01 davodogster

👍

Jul 04 '24 12:07 dgcnz

Hi @Borda and @carmocca, i'd love to provide an PR for this issue and just implemented this on my branch.

Is this feature request still valid?

May 04 '25 10:05 GdoongMathew