onnxruntime_backend
onnxruntime_backend copied to clipboard
Allow to specify tensorrt cache path per version
When using onnx with tensorrt it saves a lot of time to use the tensorrt cache path. The drawback is that onnxruntime is not smart enough to avoid using the same cache if the model is different or tensorrt version changed causing a lot of errors.
It would be great if it could generate a tensorrt cache path per version of the model, that would solve at least generating wrong outputs when changing model version. If the path could contain GPU model and tensorrt version that would solve the other case as well, but I think that's a less problem as it's acceptable to clear the cache when deploying new versions.
The Warmup feature solves all this issues but it comes at the cost of very slow startup, some models can take minutes to generate the tensorrt plan.