onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

Missing dll cudnn_ops_infer64_8.dll does not generate a python error

Open martinResearch opened this issue 1 year ago • 3 comments
trafficstars

Describe the issue

When trying to create a session with onnx_sess = InferenceSession(model, providers=["CUDAExecutionProvider"]) with the dll cudnn_ops_infer64_8.dll missing from the path (one can simply rename this file to reproduce), we get an error message printed in the log "Could not locate cudnn_ops_infer64_8.dll. Please make sure it is in your library path!" and the code stops its execution, but we do not get a python error.

Why is it a problem? Because this error message in not a actual python error it not displayed in the log when using pytest for example, which make investigating the cause of the failed test harder when this dll is missing. Digging in the python code the code stops it execution on line https://github.com/microsoft/onnxruntime/blob/737eb48f5c26ed2ac97e6fce0faf0831207d6f59/onnxruntime/python/onnxruntime_inference_collection.py#L483 The pybind binding should throw a python error instead of just stopping its execution.

To reproduce

  • rename the dll cudnn_ops_infer64_8.dll into cudnn_ops_infer64_8_renamed.dll
  • run any code that uses onnx_sess = InferenceSession(model, providers=["CUDAExecutionProvider"])

Urgency

No response

Platform

Windows

OS Version

11

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime-gpu 1.16.3

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

martinResearch avatar May 08 '24 14:05 martinResearch

DLL loading especially indirect dependencies are handled by the OS. The message you are seeing is from a system loader. Neither ORT nor Python have any control over that.

yuslepukhin avatar May 08 '24 18:05 yuslepukhin

I understand from your response that neither ORT or python can change the error message that the OS generates when trying to load the dll. But I am not sure to understand why that would imply that ORT has no way to detect that the OS did not manage to load the library and then throw an error if that is the case. It seems to me that if the dll loading fail then we would get out_module == nullptr on this line https://github.com/microsoft/onnxruntime/blob/58d7b1220550f87ad58a195dc5605fa8c23fe98f/winml/lib/Api.Ort/OnnxruntimeEnvironment.cpp#L43C1-L45C4. and we should then be able to throw an error that gets propagated to python. I am missing something?

martinResearch avatar May 08 '24 20:05 martinResearch

https://learn.microsoft.com/en-us/windows/win32/dlls/load-time-dynamic-linking

yuslepukhin avatar May 09 '24 17:05 yuslepukhin