aot-benchmark
aot-benchmark copied to clipboard
Optimizing Model Performance: Exploring ONNX Export and Engine Integration with TensorRT and OpenVino
Hi, have you explored evaluating the architecture through the export to ONNX format and its implementation with different engines like TensorRT or OpenVino?
Thanks for your interest. Currently, we haven't explored exporting the models to other formats.
Thanks for your interest. Currently, we haven't explored exporting the models to other formats.
I did some tests, unifying all the attention blocks and building it with TensorRT v8.5. I've notice only improvement in memory consumption. For a long-term memory tensor of 5 frames max, the memory reserved (input size 1280×720) is reduced from 20GB to 10 GB.
While this improvement in memory the runtime using jetson platform is slower when building the TRT engine respect of pure Pytorch, instead using a nvidia RTX card it remain the same (I'm running the engine using python api).
What do you think about this approache (https://github.com/hkchengrex/Cutie) ? Your object memory version is similar?
There are some Torch compile issues with these models: https://github.com/pytorch/pytorch/issues/103716
Could you please share the sample script to convert model to onnx?