onnxruntime [Performance] Increased memory usage when loading from bytes

Describe the issue

Until now we were creating our Ort::Session object by passing it the path of our model (.onnx file). Now we are trying to create the Session object from the bytes already read in a std::vector. Although everything seems to work correctly, we have detected a higher memory consumption, approximately the size of the model. We are reasonably sure that the vector is being released correctly, so we have the impression that creating the Session is making a copy that is not being released. Is this expected? Or are we doing something incorrectly?

To reproduce

We observe a much bigger memory usage when doing this:

std::vector<unsigned char> model_bytes;
std::ifstream file("model.onnx");
if (!file.eof() && !file.fail())
{
    file.seekg(0, std::ios_base::end);
    std::streampos fileSize = file.tellg();
    model_bytes.resize(fileSize);

    file.seekg(0, std::ios_base::beg);
    file.read(&model_bytes[0], fileSize);
}
session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options);

rather than this: session = std::make_shared<Ort::Session>(env, "/path/to/model/file.onnx", session_options);

Urgency

Not really urgent, just curious about this case as we want to load the models from memory eventually.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Jun 25 '24 15:06 ignogueiras

Version 1.15.1 is rather old. Is this still an issue with the latest release?

Jun 25 '24 16:06 cbourjau

Hello again, I have just taked some memory profiles of different executions for each case. This is the memory usage if I load the model from the file path directly: load_file_1 15 1

And this one is loading from the vector of bytes: load_vector_1 15 1

The only change in the code is that I call: session = std::make_shared<Ort::Session>(env, model_bytes.data(), model_bytes.size(), session_options); instead of session = std::make_shared<Ort::Session>(env, "/path/to/model.onnx", session_options); The rest of the code is the same, I am still reading the file, creating the vector, etc, just calling the other constructor

As you suggested I tried it again with the newer 1.18.0 release, simply by replacing the release files in my deps folder, and while when loading from the file path I get the same behaviour, when loading from the vector it performs even worse: load_vector_1 18 0

Jun 26 '24 06:06 ignogueiras

That is a pretty sizable regression in terms of memory usage in any case! Was there a particular version between 1.15.1 and 1.18.0 that caused the even worse memory usage?

Jun 26 '24 08:06 cbourjau

Well, we jumped directly from the 1.15.1 we were using to the 1.18.0 for this test, but I just did a quick check and I can already see this increased memory usage with 1.16.1 I was unable to compile with 1.16.0 due to some missing headers in the includes/ folder, btw

Jun 26 '24 10:06 ignogueiras

We have tested it with version 1.18.1, but it shows the same memory profile.

Jul 02 '24 10:07 ignogueiras

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

Aug 01 '24 15:08 github-actions[bot]

Thanks for all the additional information @ignogueiras ! I'm afraid I don' have a good guess what the origin of your problem might be. But maybe you can try it again with the latest release from today?

Aug 01 '24 17:08 cbourjau

Hello again @cbourjau Sorry for the late response, I was out of the office the last weeks.

I did some more tests today and I am starting to doubt my previous results as I am unable to reproduce they now. I keep seeing the same memory profile loading from bytes and from filepath. I am using a different machine right now, so could it be related to the different hardware?

I'll keep doing some more tests, maybe I am forgetting some steps of my old runs.

What I can still see is a regression with the last versions respect v1.15.1.

v1.15.1 load_file_v1 15 1

v1.19.0 load_file_v1 19 0

As you can see, the profiles have an almost identical form. First there is a resource loading and then some memory is released, approximately the size of the model. But in the new version, before this release, more memory is allocated, again the size of the model, negating the subsequent memory release. It looks like some kind of copy of the model data was added there and it is not being freed afterwards.

Sep 02 '24 09:09 ignogueiras

Sorry, I missed your previous response! Thanks for trying it with the then-latest version. Would you mind running the test again on the now-latest? It would be good to know the exact version that introduced the increased memory usage that you observed if the issue still persists in the latest version. What new hardware are you using now?

Jul 08 '25 06:07 cbourjau

Applying stale label due to no activity in 30 days

Aug 07 '25 15:08 microsoft-github-policy-service[bot]

Applying stale label due to no activity in 30 days

Sep 06 '25 21:09 microsoft-github-policy-service[bot]

Closing issue due to no activity in 30 days

Sep 06 '25 21:09 microsoft-github-policy-service[bot]

onnxruntime onnxruntime copied to clipboard

[Performance] Increased memory usage when loading from bytes

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

onnxruntime
onnxruntime copied to clipboard