openvino [CPU] Enable mmap for model loading from cache.

[CPU] Enable mmap for model loading from cache.

Open nshchego opened this issue 1 year ago • 3 comments

Details:

Use mmap for model compilation from cache.
...

Tickets:

Part of the task 127331

Mar 07 '24 06:03 nshchego

Could you give some test data about how much benefit we can achieve from import model with mmap buffer? such as, how many memory has been saved? And is there any performance impact for inference throughput and first inference latency?

I attached perf numbers to the ticket

Apr 15 '24 09:04 nshchego

Implementation LGTM. Please, add tests

There are enough number of test cases in the CompileModelCacheTestBase. They cover mmap as well.

Apr 25 '24 08:04 nshchego

@nshchego , the main question is why cannot we implement std::basic_streambuf over the mapped memory block? If we had such an implementation, we could reuse the most of the existing serialization/deserialization code without changes and without introducing a separate code path, which essentially accesses the buffer directly instead of working through STL stream.

May 22 '24 16:05 maxnick

This PR will be closed in a week because of 2 weeks of no activity.

Jun 07 '24 00:06 github-actions[bot]

This PR will be closed in a week because of 2 weeks of no activity.

Jun 22 '24 00:06 github-actions[bot]

This PR will be closed in a week because of 2 weeks of no activity.

Sep 03 '24 00:09 github-actions[bot]

openvino openvino copied to clipboard

[CPU] Enable mmap for model loading from cache.

Details:

Tickets:

openvino
openvino copied to clipboard