openvino
openvino copied to clipboard
[CPU] Enable mmap for model loading from cache.
Details:
- Use mmap for model compilation from cache.
- ...
Tickets:
- Part of the task 127331
Could you give some test data about how much benefit we can achieve from import model with mmap buffer? such as, how many memory has been saved? And is there any performance impact for inference throughput and first inference latency?
I attached perf numbers to the ticket
Implementation LGTM. Please, add tests
There are enough number of test cases in the CompileModelCacheTestBase. They cover mmap as well.
@nshchego , the main question is why cannot we implement std::basic_streambuf over the mapped memory block? If we had such an implementation, we could reuse the most of the existing serialization/deserialization code without changes and without introducing a separate code path, which essentially accesses the buffer directly instead of working through STL stream.
This PR will be closed in a week because of 2 weeks of no activity.
This PR will be closed in a week because of 2 weeks of no activity.
This PR will be closed in a week because of 2 weeks of no activity.