server icon indicating copy to clipboard operation
server copied to clipboard

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings

Open fran6co opened this issue 3 years ago • 5 comments
trafficstars

Description

Using onnxruntime backend with tensorrt and engine cache via the load_model makes the tensorrt cache to be regenerated every time.

Triton Information Triton container nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

config file

{
  "name": "model-name",
  "platform": "onnxruntime_onnx",
  "optimization": {
    "execution_accelerators": {
      "gpu_execution_accelerator": [
        {
          "name": "tensorrt",
          "parameters": {
            "trt_engine_cache_path": "/root/.cache/triton-tensorrt",
            "trt_engine_cache_enable": "true",
            "precision_mode": "FP16"
          }
        }
      ]
    }
  }
}

Any onnx file and call:

triton_client.load_model(model_name, config=model_config_json, files={"file:1/model.onnx": onnx_model_binary})

Expected behavior

It should generate the engine cache only once.

The problem comes from https://github.com/triton-inference-server/core/blob/bb9756f2012b3b15bf8d7a9e1e2afd62a7e603b5/src/model_repository_manager.cc#L108 where it creates a temporary folder with a random name and the trt engine cache uses the path as part of the cache

fran6co avatar Jul 05 '22 17:07 fran6co

Hi @fran6co ,

Thanks for reporting the issue and doing some initial investigation.

@GuanLuo what do you think, related to your recent override changes?

rmccorm4 avatar Jul 05 '22 18:07 rmccorm4

This also happens when using models from a cloud service like s3

fran6co avatar Jul 06 '22 09:07 fran6co

There are 3 solutions:

  • change how the tensorrt cache path is generated (this needs a change in onnxruntime
  • create temporary path with consistent names when dealing with cloud or overridden
  • change triton onnxruntime backend to not use paths but binary, this produces consistent tensorrt caches https://github.com/triton-inference-server/onnxruntime_backend/pull/126

fran6co avatar Jul 06 '22 09:07 fran6co

This would be very helpul to speed up development and reduce our system's start time.

robertbagge avatar Jul 11 '22 12:07 robertbagge

Filed DLIS-3954 to look into this.

rmccorm4 avatar Jul 11 '22 19:07 rmccorm4

Any news on this topic? I still face the same issue.

bmaier96 avatar Aug 09 '23 12:08 bmaier96