High committed memory size
Hello,
I work on a game where we would like to use mimalloc because of its performance, especially in multithreaded scenarios. So far we have been using dlmalloc which is slow when multiple threads allocate in parallel.
I have noticed that mimalloc, at least in our game, has significantly more committed memory at a given time than dlmalloc. When comparing both allocators in a typical scenario of high memory usage, mimalloc can have 700MB to 1GB more committed memory than dlmalloc, out of a total of ~20GB committed memory for the game's process. At the same time, the working set size can be 400MB lower for mimalloc than dlmalloc!
While this is fine on PC (Windows), it is a problem on consoles where we cannot commit more memory than there is actual RAM on the system, so this extra committed memory is essentially wasted RAM.
I have tried several options to reduce the commit size, as seen in these issues: https://github.com/microsoft/mimalloc/issues/647, https://github.com/microsoft/mimalloc/issues/537
- Disable eager_commit
- Set purge_delay to 0
- Set purge_decommits to 1
- Calling mi_collect(true) regularly on worker threads
- Changing the value of MI_SEGMENT_SLICE_SHIFT from 64kB to 32kB (also tried with 128kB)
None of these had a significant impact.
We are currently using mimalloc v2.1.7. I also tried v3.1.4 and the commit size was roughly the same.
Is this a behavior you have already seen? Do you have any tips to reduce this commit footprint beside what I already mentioned?
Thanks in advance.
Thanks for the report -- I think we might be able to reduce it using v3 (v3.1.4+) but it may need some experimentation. Can you try v3.1.5 with the following options?
MIMALLOC_PURGE_DELAY=0. (not recommended in general but will give us a clue what might be the root cause)MIMALLOC_PAGE_MAX_RECLAIM=4MIMALLOC_PAGE_COMMIT_ON_DEMAND=1MIMALLOC_PAGE_FULL_RETAIN=1
Try it all separately and see if any makes a difference -- that may give us insight in the root cause.
If none of these make a difference, perhaps try to run cmake as:
$ cmake ../.. -DMI_EXTRA_CPPDEFS="MI_ARENA_SLICE_SHIFT=15"
and see if that improves things (this uses mimalloc small pages of size 32KiB)
Thank you for answering.
I have tried the options you suggested separately, including the lower slice shift, and none of them made a significant difference on 3.1.4. Are there any changes between .4 and .5 that could affect this? I stuck to 3.1.4 for my testing as the allocator's source files are integrated directly in our game's source tree, and I didn't want to do another integration as it's a little tedious.
In the meantime, a coworker showed me a talk by a Capcom developer that happens to tackle our exact problem: https://www.capcom-games.com/coc/2023/en/session/13/ It's about 2 years old but what is discussed seems to still apply, at least in our case. In short, they are using mimalloc in their in-house RE Engine and they made some changes directly to mimalloc to more aggressively release memory. Were you aware of this? Do you know if any of these changes were upstreamed? I suppose their changes come with compromises that maybe would not be appropriate for the more general use cases that mimalloc wants to cover.
Hmm, difficult to say -- I was hoping one of those options would help. v3 is created mainly to have generally much less commit than v2 so not sure if it can be made better still. I did not know about the talk and work you referred too -- I will watch the video and try to understand what they did (this would probably not apply though to v3 since that already shares memory between threads more effectively). There is some tradeoff though between minimal commit and scalable performance in general though so at some point it becomes hard to reduce commit further.
Other thoughts:
- v3.1.5 fixes an important bug in v3.1.4 but it only could happen in OOM situations so it won't affect your results.
- Can you also try to run with MIMALLOC_GENERIC_COLLECT=1000 and see if that makes a difference?
Hi Daan, I'm excited about mimalloc's 3.x beta! Could you share when you expect the stable 3.x version to be released?
I have tried with MIMALLOC_GENERIC_COLLECT=1000, it did not make a significant difference.
I understand no drop-in solution is going to be perfect for us on consoles. If that is ok, I will leave this issue open and report with what we end up doing so it may help other people.
I have to say that on PC mimalloc has been great for us though, especially for game asset generation where the workload is parallelized.
Thank you again for your help.
@hlabanca-arkfr - was this with mimalloc3 - I've did my own tests, and found that mi_collect(true) with mimalloc2 does not collect the same way mimalloc3 does - e.g. mimalloc3 really collects in my test.