aot-benchmark icon indicating copy to clipboard operation
aot-benchmark copied to clipboard

Optimizing Model Performance: Exploring ONNX Export and Engine Integration with TensorRT and OpenVino

Open AntonioConsiglio opened this issue 1 year ago • 4 comments

Hi, have you explored evaluating the architecture through the export to ONNX format and its implementation with different engines like TensorRT or OpenVino?

AntonioConsiglio avatar Oct 10 '23 08:10 AntonioConsiglio

Thanks for your interest. Currently, we haven't explored exporting the models to other formats.

z-x-yang avatar Oct 22 '23 09:10 z-x-yang

Thanks for your interest. Currently, we haven't explored exporting the models to other formats.

I did some tests, unifying all the attention blocks and building it with TensorRT v8.5. I've notice only improvement in memory consumption. For a long-term memory tensor of 5 frames max, the memory reserved (input size 1280×720) is reduced from 20GB to 10 GB.

While this improvement in memory the runtime using jetson platform is slower when building the TRT engine respect of pure Pytorch, instead using a nvidia RTX card it remain the same (I'm running the engine using python api).

What do you think about this approache (https://github.com/hkchengrex/Cutie) ? Your object memory version is similar?

AntonioConsiglio avatar Oct 22 '23 09:10 AntonioConsiglio

There are some Torch compile issues with these models: https://github.com/pytorch/pytorch/issues/103716

bhack avatar Oct 22 '23 16:10 bhack

Could you please share the sample script to convert model to onnx?

SuyueLiu avatar Aug 16 '24 08:08 SuyueLiu