server ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings

Open fran6co opened this issue 3 years ago • 5 comments

trafficstars

Description

Using onnxruntime backend with tensorrt and engine cache via the load_model makes the tensorrt cache to be regenerated every time.

Triton Information Triton container nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

config file

{
  "name": "model-name",
  "platform": "onnxruntime_onnx",
  "optimization": {
    "execution_accelerators": {
      "gpu_execution_accelerator": [
        {
          "name": "tensorrt",
          "parameters": {
            "trt_engine_cache_path": "/root/.cache/triton-tensorrt",
            "trt_engine_cache_enable": "true",
            "precision_mode": "FP16"
          }
        }
      ]
    }
  }
}

Any onnx file and call:

triton_client.load_model(model_name, config=model_config_json, files={"file:1/model.onnx": onnx_model_binary})

Expected behavior

It should generate the engine cache only once.

The problem comes from https://github.com/triton-inference-server/core/blob/bb9756f2012b3b15bf8d7a9e1e2afd62a7e603b5/src/model_repository_manager.cc#L108 where it creates a temporary folder with a random name and the trt engine cache uses the path as part of the cache

Jul 05 '22 17:07 fran6co

Hi @fran6co ,

Thanks for reporting the issue and doing some initial investigation.

@GuanLuo what do you think, related to your recent override changes?

Jul 05 '22 18:07 rmccorm4

This also happens when using models from a cloud service like s3

Jul 06 '22 09:07 fran6co

There are 3 solutions:

change how the tensorrt cache path is generated (this needs a change in onnxruntime
create temporary path with consistent names when dealing with cloud or overridden
change triton onnxruntime backend to not use paths but binary, this produces consistent tensorrt caches https://github.com/triton-inference-server/onnxruntime_backend/pull/126

Jul 06 '22 09:07 fran6co

This would be very helpul to speed up development and reduce our system's start time.

Jul 11 '22 12:07 robertbagge

Filed DLIS-3954 to look into this.

Jul 11 '22 19:07 rmccorm4

Any news on this topic? I still face the same issue.

Aug 09 '23 12:08 bmaier96

server server copied to clipboard

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings

server
server copied to clipboard