onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[Performance] Mapfile support for certain external data files is not working

Open ivberg opened this issue 1 year ago • 1 comments

Describe the issue

We are attempting to get mapfile support working well using external data files. The model loads fine and works, but while debugging we noticed mapfile support is not working well and error'ing out inside ORT code

https://github.com/microsoft/onnxruntime/pull/19089 https://github.com/onnx/onnx/blob/main/docs/ExternalData.md

Callstack where the mapfile fails due to alignment issues: 00 ps_onnxruntime!onnxruntime::WindowsEnv::MapFileIntoMemory+0xa90 [D:\a_work\1\s\onnxruntime\onnxruntime\core\platform\windows\env.cc @ 449] // Failure here 01 ps_onnxruntime!onnxruntime::utils::GetFileContent+0x12c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 899] 02 ps_onnxruntime!onnxruntime::utils::GetExtDataFromTensorProto+0x484 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\tensorprotoutils.cc @ 1015] // The buffer size, length is coming from here 03 ps_onnxruntime!onnxruntime::session_state_utils::ExtDataTensorProtoToTensor+0x8c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 73] 04 ps_onnxruntime!onnxruntime::session_state_utils::DeserializeTensorProto+0x37c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 126] 05 ps_onnxruntime!onnxruntime::session_state_utils::SaveInitializedTensors+0x1208 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state_utils.cc @ 310] 06 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionStateImpl+0x76c [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1476] 07 ps_onnxruntime!onnxruntime::SessionState::FinalizeSessionState+0x1b4 [D:\a_work\1\s\onnxruntime\onnxruntime\core\framework\session_state.cc @ 1189] 08 ps_onnxruntime!onnxruntime::InferenceSession::Initialize+0x2178 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\inference_session.cc @ 2015] 09 ps_onnxruntime!`anonymous namespace'::InitializeSession+0x250 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 763] 0a ps_onnxruntime!OrtApis::CreateSession+0xa0 [D:\a_work\1\s\onnxruntime\onnxruntime\core\session\onnxruntime_c_api.cc @ 779]

Instead we are hitting an error "mapped offset must be a multiple of the allocation granularity"..." from ORT and swallowing it. I say swallowing it because as per other stack yes we go on the error path reading the whole file into the buffer as backup.

To reproduce

Get a model with external data file. e.g. model.onnx & model.onnx.data. Not all files will reproduce the issue due to alignment with the target OS

const ORTCHAR_T * filemodelpath = ORT_TSTR("model.onnx"); Load with: Ort::Session(env, filemodelpath, session_options);

// The model seems to load fine and works with external data file

Urgency

Fairly urgent

For now trying workaround with AddExternalInitializersFromFilesInMemory

Platform

Windows

OS Version

23H2

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

55f7f9d7a9b88c4e7f0eb7cf4d7f31004761f5cb

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

ivberg avatar Jun 27 '24 19:06 ivberg

Can you attach a sample model? and this happens on ARM64 only?

pranavsharma avatar Jun 27 '24 19:06 pranavsharma

We are seeing about sharing the model directly. It seems the alignment issue could happen on multiple platforms. I happen to be testing / using ARM64 though.

ivberg avatar Jul 03 '24 22:07 ivberg

External data produced by the ONNX exporter by PyTorch 2.5 will be aligned.

justinchuby avatar Sep 07 '24 15:09 justinchuby

Applying stale label due to no activity in 30 days

Applying stale label due to no activity in 30 days