Excessive memory retention using either mimalloc v2 or v3 on macOS
Rspack is using mimalloc for speeding up the memory allocation. It's is a tool for transforming and bundling basically JavaScript files. According to our test to resolve this issue, we found out, on macOS, there's some strange memory retention as rspack rebuilds. I tried to create a minimal reproducible demo but unluckily they all failed. So I have to put on my local testing demo for reproduction using Rust:
https://github.com/h-a-n-a/rspack-allocation-test
In this demo, an rspack compiler was created to compile JavaScript in 10000 directory. For each build and rebuild, Rspack would trigger tokio-rs to spawn(if not already spawned) a few green threads to drive asynchronous tasks. Then, Rspack would trigger a series of JavaScript module transformations, then optimizations. Finally, assets generated in each build or rebuild will be emitted to the dist file.
During the compilation, the initial memory on macOS would be around 600 MB, and after a few rebuilds, the memory will skyrocket to 1 GB and more. This does not happen on my ubuntu-22.04 or when I was using macOS's system allocator. This does happen on both mimalloc v2 and v3.
I've added some details to help reproduce the issue in the repo and will try my best to create a minimal reproducible demo. Please bear with me.
Looking forward to hearing from you. Cheers!
Yikes -- thanks for the report. Strange that it happens with both v2 and v3, and not on ubuntu. Thanks for the repo -- if I find time I will try it out and see. Can you try the following environment settings on the latest dev3-bin branch:
MIMALLOC_ARENA_EAGER_COMMIT=0- and independently,
MIMALLOC_PAGE_COMMIT_ON_DEMAND=1
Also, as an experiment, trying, MIMALLOC_PURGE_DELAY=0 would be interesting (this can slow things down though but would perhaps give us a clue).
@daanx Thanks for the quick reply!
I've tested these options on my macOS. Nothing has been changed in the demo other than changing some value to make it rebuild indefinitely.
Here's some stats I yanked off from top. The result shows memory consumption on branch dev3-bin do accumulate slower than it was while it was on dev2-bin, but it still accumulates as time passes.
With branch dev3-bin:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 633M+
mimalloc-test 635M+
mimalloc-test 639M+
mimalloc-test 666M+
mimalloc-test 700M+
mimalloc-test 704M+
mimalloc-test 706M+
mimalloc-test 706M+
mimalloc-test 719M+
mimalloc-test 737M+
mimalloc-test 738M+
mimalloc-test 738M+
mimalloc-test 738M
mimalloc-test 741M+
mimalloc-test 753M+
mimalloc-test 771M+
mimalloc-test 772M+
mimalloc-test 788M+
mimalloc-test 801M+
mimalloc-test 803M+
mimalloc-test 804M+
mimalloc-test 804M+
mimalloc-test 804M
mimalloc-test 810M+
mimalloc-test 813M+
mimalloc-test 832M+
mimalloc-test 833M+
mimalloc-test 834M+
mimalloc-test 836M+
mimalloc-test 836M+
mimalloc-test 837M+
mimalloc-test 837M+
mimalloc-test 852M+
mimalloc-test 869M+
mimalloc-test 869M+
mimalloc-test 870M+
mimalloc-test 876M+
mimalloc-test 896M+
mimalloc-test 896M
mimalloc-test 896M+
mimalloc-test 896M
mimalloc-test 898M+
mimalloc-test 901M+
mimalloc-test 901M
mimalloc-test 902M+
mimalloc-test 907M+
mimalloc-test 926M+
mimalloc-test 931M+
mimalloc-test 934M+
mimalloc-test 934M+
mimalloc-test 934M-
mimalloc-test 934M+
mimalloc-test 942M+
mimalloc-test 944M+
mimalloc-test 946M+
mimalloc-test 948M+
mimalloc-test 965M+
mimalloc-test 965M+
mimalloc-test 965M
With branch dev2-bin:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 584M+
mimalloc-test 720M+
mimalloc-test 843M+
mimalloc-test 885M+
mimalloc-test 946M+
mimalloc-test 1015M+
mimalloc-test 1122M+
mimalloc-test 1253M+
mimalloc-test 1268M+
mimalloc-test 1303M+
mimalloc-test 1326M+
mimalloc-test 1339M+
mimalloc-test 1344M+
mimalloc-test 1353M+
mimalloc-test 1355M+
mimalloc-test 1358M+
mimalloc-test 1360M+
mimalloc-test 1444M+
mimalloc-test 1485M+
mimalloc-test 1507M+
mimalloc-test 1511M+
mimalloc-test 1526M+
mimalloc-test 1529M+
mimalloc-test 1536M+
mimalloc-test 1556M+
mimalloc-test 1558M+
mimalloc-test 1562M+
mimalloc-test 1567M+
mimalloc-test 1569M+
mimalloc-test 1571M+
mimalloc-test 1634M+
mimalloc-test 1669M+
mimalloc-test 1671M+
mimalloc-test 1691M+
mimalloc-test 1713M+
mimalloc-test 1716M+
mimalloc-test 1717M+
mimalloc-test 1719M+
mimalloc-test 1721M+
mimalloc-test 1724M+
mimalloc-test 1727M+
mimalloc-test 1729M+
mimalloc-test 1731M+
mimalloc-test 1733M+
mimalloc-test 1735M+
mimalloc-test 1753M+
mimalloc-test 1756M+
mimalloc-test 1758M+
mimalloc-test 1762M+
mimalloc-test 1763M+
mimalloc-test 1834M+
mimalloc-test 1841M+
mimalloc-test 1861M+
mimalloc-test 1867M+
mimalloc-test 1886M+
mimalloc-test 1887M+
mimalloc-test 1890M+
mimalloc-test 1921M+
mimalloc-test 1924M+
mimalloc-test 1936M+
mimalloc-test 1938M+
With branch dev3-bin and environment set to MIMALLOC_ARENA_EAGER_COMMIT=0:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 576M+
mimalloc-test 634M+
mimalloc-test 661M+
mimalloc-test 663M+
mimalloc-test 682M+
mimalloc-test 698M+
mimalloc-test 700M+
mimalloc-test 717M+
mimalloc-test 732M+
mimalloc-test 734M+
mimalloc-test 735M+
mimalloc-test 735M+
mimalloc-test 735M
mimalloc-test 740M+
mimalloc-test 740M
mimalloc-test 741M+
mimalloc-test 763M+
mimalloc-test 764M+
mimalloc-test 767M+
mimalloc-test 781M+
mimalloc-test 798M+
mimalloc-test 801M+
mimalloc-test 809M+
mimalloc-test 829M+
mimalloc-test 829M
mimalloc-test 830M+
mimalloc-test 830M+
mimalloc-test 830M+
mimalloc-test 831M+
mimalloc-test 832M+
mimalloc-test 836M+
mimalloc-test 857M+
mimalloc-test 862M+
mimalloc-test 862M+
mimalloc-test 862M+
mimalloc-test 865M+
mimalloc-test 865M
mimalloc-test 865M
mimalloc-test 869M+
mimalloc-test 894M+
mimalloc-test 897M+
mimalloc-test 897M
mimalloc-test 897M
mimalloc-test 897M
mimalloc-test 897M
mimalloc-test 897M+
mimalloc-test 906M+
mimalloc-test 923M+
mimalloc-test 928M+
mimalloc-test 928M+
mimalloc-test 928M
mimalloc-test 928M+
mimalloc-test 928M
mimalloc-test 938M+
mimalloc-test 941M+
mimalloc-test 959M+
mimalloc-test 959M
mimalloc-test 959M+
mimalloc-test 972M+
mimalloc-test 974M+
mimalloc-test 974M
mimalloc-test 974M
mimalloc-test 991M+
mimalloc-test 991M+
mimalloc-test 991M
mimalloc-test 991M+
mimalloc-test 991M
mimalloc-test 992M+
mimalloc-test 1002M+
mimalloc-test 1002M
mimalloc-test 1005M+
With branch dev3-bin and environment set to MIMALLOC_PAGE_COMMIT_ON_DEMAND=1:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 661M
mimalloc-test 681M+
mimalloc-test 691M+
mimalloc-test 692M+
mimalloc-test 693M+
mimalloc-test 697M+
mimalloc-test 711M+
mimalloc-test 728M+
mimalloc-test 731M+
mimalloc-test 731M+
mimalloc-test 733M+
mimalloc-test 751M+
mimalloc-test 751M
mimalloc-test 752M+
mimalloc-test 761M+
mimalloc-test 762M+
mimalloc-test 766M+
mimalloc-test 782M+
mimalloc-test 782M
mimalloc-test 793M+
mimalloc-test 797M+
mimalloc-test 797M+
mimalloc-test 798M+
mimalloc-test 814M+
mimalloc-test 815M+
mimalloc-test 823M+
mimalloc-test 825M+
mimalloc-test 826M+
mimalloc-test 826M
mimalloc-test 827M+
mimalloc-test 831M+
mimalloc-test 831M
mimalloc-test 836M+
mimalloc-test 839M+
mimalloc-test 857M+
mimalloc-test 859M+
mimalloc-test 862M+
mimalloc-test 862M
mimalloc-test 878M+
mimalloc-test 884M+
mimalloc-test 885M+
mimalloc-test 892M+
mimalloc-test 893M+
mimalloc-test 893M
mimalloc-test 903M+
mimalloc-test 903M+
mimalloc-test 908M+
mimalloc-test 925M+
mimalloc-test 925M
mimalloc-test 925M+
mimalloc-test 927M+
mimalloc-test 927M+
mimalloc-test 927M+
mimalloc-test 951M+
mimalloc-test 958M+
mimalloc-test 958M+
mimalloc-test 958M+
mimalloc-test 971M+
mimalloc-test 971M
mimalloc-test 988M+
mimalloc-test 988M+
With branch dev3-bin and environment set to MIMALLOC_PURGE_DELAY=0:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 548M+
mimalloc-test 629M+
mimalloc-test 644M+
mimalloc-test 662M+
mimalloc-test 664M+
mimalloc-test 666M+
mimalloc-test 696M+
mimalloc-test 698M+
mimalloc-test 699M+
mimalloc-test 700M+
mimalloc-test 718M+
mimalloc-test 731M+
mimalloc-test 731M+
mimalloc-test 733M+
mimalloc-test 734M+
mimalloc-test 745M+
mimalloc-test 746M+
mimalloc-test 746M+
mimalloc-test 762M+
mimalloc-test 763M+
mimalloc-test 766M+
mimalloc-test 778M+
mimalloc-test 779M+
mimalloc-test 798M+
mimalloc-test 798M+
mimalloc-test 798M
mimalloc-test 798M
mimalloc-test 815M+
mimalloc-test 823M+
mimalloc-test 823M+
mimalloc-test 823M+
mimalloc-test 827M+
mimalloc-test 828M+
mimalloc-test 830M+
mimalloc-test 831M+
mimalloc-test 831M-
mimalloc-test 841M+
mimalloc-test 841M+
mimalloc-test 841M
mimalloc-test 858M+
mimalloc-test 858M+
mimalloc-test 863M+
mimalloc-test 864M+
mimalloc-test 864M
mimalloc-test 864M
mimalloc-test 864M
mimalloc-test 864M
mimalloc-test 872M+
mimalloc-test 894M+
mimalloc-test 894M
mimalloc-test 895M+
mimalloc-test 895M
mimalloc-test 896M+
mimalloc-test 896M
mimalloc-test 897M+
mimalloc-test 903M+
mimalloc-test 928M+
mimalloc-test 928M
mimalloc-test 929M+
mimalloc-test 929M
mimalloc-test 941M+
mimalloc-test 959M+
mimalloc-test 959M
mimalloc-test 959M
mimalloc-test 961M+
mimalloc-test 961M-
mimalloc-test 961M+
mimalloc-test 961M+
mimalloc-test 962M+
mimalloc-test 962M
mimalloc-test 974M+
mimalloc-test 991M+
mimalloc-test 991M+
mimalloc-test 991M
mimalloc-test 991M
mimalloc-test 991M+
mimalloc-test 992M+
mimalloc-test 992M
mimalloc-test 992M+
mimalloc-test 995M+
mimalloc-test 995M
mimalloc-test 995M+
mimalloc-test 1003M+
mimalloc-test 1023M+
mimalloc-test 1023M
mimalloc-test 1023M
Good to see v3 does much better as v2; I guess because it doesn't occur on Linux it must be something system specific, like an allocation in a thread that is about to be terminated (reinitializing the heap and leave it orphaned). Not sure. I'll try to repro when I find some time.
I tried to compile from your repo (awesome that you constructed this! maybe we can use it a standard benchmark in the future).
I first got an error to use nightly, so I use cargo +nightly build --release but now I get:
error[E0599]: no method named `get_many_mut` found for struct `HashMap` in the current scope
--> /Users/daan/.cargo/git/checkouts/rspack-c7c50c913aba6932/a04609d/crates/rspack_collections/src/ukey.rs:156:16
|
156 | self.inner.get_many_mut(ids)
| ^^^^^^^^^^^^
|
help: there is a method `get_mut` with a similar name
|
156 - self.inner.get_many_mut(ids)
156 + self.inner.get_mut(ids)
|
For more information about this error, try `rustc --explain E0599`.
error: could not compile `rspack_collections` (lib) due to 1 previous error
Can you help?
I tried to compile from your repo (awesome that you constructed this! maybe we can use it a standard benchmark in the future). I first got an error to use nightly, so I use
cargo +nightly build --releasebut now I get:error[E0599]: no method named `get_many_mut` found for struct `HashMap` in the current scope --> /Users/daan/.cargo/git/checkouts/rspack-c7c50c913aba6932/a04609d/crates/rspack_collections/src/ukey.rs:156:16 | 156 | self.inner.get_many_mut(ids) | ^^^^^^^^^^^^ | help: there is a method `get_mut` with a similar name | 156 - self.inner.get_many_mut(ids) 156 + self.inner.get_mut(ids) | For more information about this error, try `rustc --explain E0599`. error: could not compile `rspack_collections` (lib) due to 1 previous errorCan you help?
Turned out the rust lang team had changed this API in the latest nightly release. I've pushed a new edit to add a rust-toolchain.toml that locks the rust toolchain to a specific version. Would you please pull and run cargo build --release again?
You can also check out the active rust toolchain in the repo directory with command rustup show.
...
active toolchain
----------------
name: nightly-2024-11-27-aarch64-apple-darwin
active because: overridden by '/path-to-rspack-allocation-test/rust-toolchain.toml'
installed targets:
aarch64-apple-darwin
x86_64-pc-windows-msvc
Thanks -- I got it running locally; but it doesn't quire reproduce. I set it to 100 iterations using the latest dev3 branch. The memory usage is much lower though than yours (around 280MiB at peak). Is that expected? Secondly, I couldn't quite reproduce the behaviour as it gets stable after about 30 to 50 iterations (from ~180MiB rss to ~280MiB rss). This is on an Apple M1, Sequoia 15.3 with carge build --release with the latest dev3.
When I set MIMALLOC_PURGE_DELAY=0 I see that there are about 400 threads, and the memory looks like (using mi_arenas_print available in the latest dev3):
Here I see lots of low use pages (the red P's) which I guess belong to many not often used threads. Over each iteration the heap keeps looking like this with around 9 chunks in use, with sometimes some large singleton objects (like the final green s page which is about 5 MiB I guess). Maybe the initial growth is due to the threadpool like nature of tokio where per-thread pages get slowly used a bit more depending on the tasks that happen to execute no them -- but in the end it stabilizes? Maybe not, as you remarked you didn't see this on Linux. Is the benchmark reading from disk? Are you writing to a log file in that same directory?
Maybe I need the larger workload that you observed of 800MiB+ -- let me know how to do that.
ps. with no options on a release build (with latest dev3, 100 iterations):
$ top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 257M
mimalloc-test 257M-
mimalloc-test 257M+
mimalloc-test 257M
mimalloc-test 257M
mimalloc-test 257M-
mimalloc-test 257M
mimalloc-test 257M
mimalloc-test 257M
mimalloc-test 257M
mimalloc-test 257M+
mimalloc-test 257M
...
mimalloc-test 287M
mimalloc-test 288M+
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M+
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 288M+
mimalloc-test 288M
mimalloc-test 288M
mimalloc-test 286M-
mimalloc-test 286M+
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
mimalloc-test 286M
I can also reproduce the problem by following the steps in this demo.
- environment: Apple M3 Max / 15.3.1
- mimalloc branch: dev3
- command:
carge build --release
Result:
top -stats command,rprvt -r | grep -i mimalloc
mimalloc-test 643M+
mimalloc-test 672M+
mimalloc-test 676M+
mimalloc-test 679M+
mimalloc-test 681M+
mimalloc-test 693M+
mimalloc-test 712M+
mimalloc-test 717M+
mimalloc-test 735M+
mimalloc-test 746M+
mimalloc-test 746M
mimalloc-test 762M+
mimalloc-test 762M+
mimalloc-test 779M+
mimalloc-test 780M+
mimalloc-test 780M+
mimalloc-test 780M+
mimalloc-test 782M+
mimalloc-test 784M+
mimalloc-test 784M+
mimalloc-test 802M+
mimalloc-test 802M+
mimalloc-test 831M+
mimalloc-test 832M+
mimalloc-test 832M+
mimalloc-test 833M+
mimalloc-test 833M
mimalloc-test 834M+
mimalloc-test 834M+
mimalloc-test 834M+
mimalloc-test 842M+
mimalloc-test 842M
mimalloc-test 843M+
mimalloc-test 843M+
mimalloc-test 843M
mimalloc-test 845M+
mimalloc-test 845M+
mimalloc-test 845M+
mimalloc-test 845M
mimalloc-test 845M+
mimalloc-test 848M+
mimalloc-test 860M+
mimalloc-test 876M+
mimalloc-test 877M+
mimalloc-test 877M
mimalloc-test 877M
mimalloc-test 878M+
mimalloc-test 878M
mimalloc-test 878M
mimalloc-test 879M+
mimalloc-test 879M
mimalloc-test 883M+
mimalloc-test 910M+
mimalloc-test 910M
mimalloc-test 910M
mimalloc-test 910M
mimalloc-test 910M
mimalloc-test 910M
mimalloc-test 940M+
mimalloc-test 940M
mimalloc-test 956M+
The memory usage is much lower though than yours (around 280MiB at peak). Is that expected?
Unfortunately, this is not expected.
I forgot to mention to install node modules in the directory 10000. This might be related to the low memory consumption in your reproduction. I've modified the README.md. You might need to install node on the machine and call pnpm install to install node modules. My edit to README.md: https://github.com/h-a-n-a/rspack-allocation-test/commit/ecf28cbfbb71e116b03c84bac5b316a1247746b1. Then, calling cargo run --release should emit no error.
Is the benchmark reading from disk?
The benchmark contains heavy operation of reading files from disk (in directory 10000) recursively. Doing this would invoke a blocking thread (this type of thread used only for reading files in rspack) of tokio to send filesystem file read command. It will keep alive for the default timeout of 10 seconds (i.e. the thread_keep_alive option)
Are you writing to a log file in that same directory?
We don't normally write log files. The only file writing operation is called at the end of each rebuild. This is driven by the same blocking thread of tokio.
I got it running now -- it still uses much less memory but I can see it grow now, from about 240MiB to 480MiB after 100 iterations. One thing I noticed was that every iteration an extra thread(s) is created, it starts at 339 threads and increases to 526 threads after 100 iterations. I will test on Linux too to see if that happens there too. I guess that extra thread is the issue? edit: on linux the threads stay constant at 199 (but uses more memory and goes much faster?)
Question: how can I configure the project to use the standard macOS allocator?
One thing I noticed was that every iteration an extra thread(s) is created, it starts at 339 threads and increases to 526 threads after 100 iterations. I will test on Linux too to see if that happens there too. I guess that extra thread is the issue?
I did some tests on my local machine and found out the thread count and the memory consumption are indeed related and intertwined. In my opinion, it's strongly linked to these two configurations passed to tokio-rs:
- max_blocking_thread: Specifies the limit for additional threads spawned by the Runtime. (i.e. some
fsoperations) - thread_keep_alive: Sets a custom timeout for a thread in the blocking pool.
Tuning either of the option will lower the memory usage.
For example, if I set max_blocking_thread to 8, which is the default blocking thread count in rspack now, it avoids tokio-rs from create as much blocking threads as they might like to. This decreases the memory.
Even for mimalloc v2.1.7, which consumes more memory than v3 (dev3-bin), memory usage is way better than it was:
# Mimalloc v2.1.7 with `max_blocking_thread` set to 8
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test 33/1 2054M+ 616M+ 618M+
mimalloc-test 33 2062M+ 677M+ 679M+
mimalloc-test 33 2070M+ 718M+ 720M+
mimalloc-test 33/8 2070M 726M+ 728M+
mimalloc-test 33 2086M+ 737M+ 739M+
mimalloc-test 33 2214M+ 759M+ 761M+
mimalloc-test 33/1 2222M+ 764M+ 766M+
mimalloc-test 33 2222M 767M+ 769M+
mimalloc-test 33 2222M 780M+ 782M+
mimalloc-test 33/19 2222M 780M+ 782M+
mimalloc-test 33 2222M 783M+ 785M+
mimalloc-test 33/1 2222M 785M+ 787M+
mimalloc-test 34/2 2224M+ 790M+ 792M+
mimalloc-test 33 2222M- 792M+ 794M+
mimalloc-test 33/20 2222M 792M+ 794M+
mimalloc-test 33 2222M 792M+ 794M+
mimalloc-test 33/1 2222M 793M+ 795M+
mimalloc-test 33/20 2222M 794M+ 796M+
mimalloc-test 33 2222M 794M+ 796M+
mimalloc-test 33/1 2222M 795M+ 797M+
mimalloc-test 34/2 2224M+ 795M+ 797M+
mimalloc-test 33 2222M- 797M+ 799M+
mimalloc-test 33/5 2222M 797M+ 799M+
mimalloc-test 34/2 2224M+ 797M+ 799M+
# Mimalloc v2.1.7 with `max_blocking_thread` set to default (512)
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test 272/1 10G+ 611M+ 617M+
mimalloc-test 272 10G+ 706M+ 712M+
mimalloc-test 272/1 10G+ 761M+ 767M+
mimalloc-test 272 10G 790M+ 796M+
mimalloc-test 272/1 10G 819M+ 825M+
mimalloc-test 272 10G 826M+ 832M+
mimalloc-test 272 10G 832M+ 838M+
mimalloc-test 272/40 10G 837M+ 844M+
mimalloc-test 272 10G 839M+ 845M+
mimalloc-test 272 10G 845M+ 851M+
mimalloc-test 272/3 10G 846M+ 852M+
mimalloc-test 272 10G 850M+ 856M+
mimalloc-test 272/20 10G 862M+ 868M+
mimalloc-test 284 10G+ 919M+ 925M+
mimalloc-test 284 10G+ 924M+ 931M+
mimalloc-test 284/56 10G 927M+ 933M+
mimalloc-test 299 11G+ 1005M+ 1012M+
mimalloc-test 299/16 11G+ 1025M+ 1031M+
mimalloc-test 299 11G 1026M+ 1033M+
mimalloc-test 317 11G+ 1081M+ 1088M+
mimalloc-test 317/1 11G 1086M+ 1093M+
mimalloc-test 317 12G+ 1105M+ 1112M+
mimalloc-test 317/53 12G 1108M+ 1114M+
mimalloc-test 317 12G 1109M+ 1116M+
The result for mimalloc-v3 (dev3-bin):
# Mimalloc v3 with `max_blocking_thread` set to 8
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test 33 1443M+ 605M+ 607M+
mimalloc-test 33/21 1442M- 615M+ 616M+
mimalloc-test 33 1442M+ 615M+ 617M+
mimalloc-test 33 1442M 618M+ 619M+
mimalloc-test 33/20 1442M 618M+ 619M+
mimalloc-test 33 1442M 621M+ 623M+
mimalloc-test 33/12 1442M 622M+ 623M+
mimalloc-test 33 1442M 622M+ 623M+
mimalloc-test 33 1442M 622M 623M
mimalloc-test 33/18 1442M 622M+ 623M+
mimalloc-test 33 1450M+ 622M+ 624M+
mimalloc-test 33/12 1450M 622M+ 624M+
mimalloc-test 33 1450M 622M+ 624M+
mimalloc-test 33/12 1450M 623M+ 625M+
mimalloc-test 33 1458M+ 623M+ 625M+
mimalloc-test 33/1 1458M 623M+ 625M+
mimalloc-test 33 1458M 623M+ 625M+
mimalloc-test 33 1458M 623M+ 625M+
mimalloc-test 33/3 1458M 623M+ 625M+
mimalloc-test 33 1458M 623M+ 625M+
mimalloc-test 33/13 1458M 623M 625M
mimalloc-test 33 1458M 623M 625M
mimalloc-test 33 1458M 623M+ 625M+
mimalloc-test 33/20 1458M 623M+ 625M+
mimalloc-test 33 1458M 623M 625M
mimalloc-test 33/12 1458M 623M 625M
mimalloc-test 33 1458M 624M+ 625M+
mimalloc-test 33 1458M 624M+ 625M+
mimalloc-test 33/20 1458M 624M+ 625M+
mimalloc-test 33 1458M 624M+ 625M+
mimalloc-test 33/12 1458M 624M+ 625M+
mimalloc-test 33 1586M+ 640M+ 642M+
mimalloc-test 33 1586M 640M+ 642M+
mimalloc-test 33/17 1586M 640M 642M
mimalloc-test 33 1586M 640M+ 642M+
mimalloc-test 33/1 1586M 644M+ 646M+
# Mimalloc v3 with `max_blocking_thread` set to default (512)
$ top -stats command,threads,vprvt,rprvt,mem -r | grep -i mimalloc
mimalloc-test 209/55 1816M+ 563M+ 565M+
mimalloc-test 227 1980M+ 628M+ 630M+
mimalloc-test 227/29 1981M+ 630M+ 632M+
mimalloc-test 243 2013M+ 651M+ 653M+
mimalloc-test 263/19 2054M+ 651M+ 653M+
mimalloc-test 263 2054M 653M+ 655M+
mimalloc-test 263/1 2054M 653M+ 655M+
mimalloc-test 263 2054M 663M+ 665M+
mimalloc-test 264/2 2056M+ 663M+ 666M+
mimalloc-test 270 2068M+ 673M+ 675M+
mimalloc-test 270 2068M 674M+ 676M+
mimalloc-test 270 2068M 694M+ 696M+
mimalloc-test 270 2068M 696M+ 698M+
mimalloc-test 270 2068M 696M+ 698M+
mimalloc-test 276 2080M+ 696M+ 698M+
mimalloc-test 276 2080M 701M+ 703M+
mimalloc-test 276 2080M 701M+ 703M+
mimalloc-test 276/12 2080M 701M+ 703M+
mimalloc-test 276 2080M 701M+ 703M+
mimalloc-test 276/1 2080M 705M+ 707M+
mimalloc-test 276 2080M 722M+ 724M+
mimalloc-test 276/1 2080M 728M+ 730M+
mimalloc-test 276 2080M 731M+ 733M+
mimalloc-test 296/12 2121M+ 732M+ 734M+
mimalloc-test 323 2176M+ 732M+ 734M+
mimalloc-test 370/12 2271M+ 739M+ 741M+
mimalloc-test 370 2399M+ 764M+ 766M+
mimalloc-test 370/16 2399M 764M+ 766M+
mimalloc-test 370 2399M 764M 766M
mimalloc-test 370/8 2399M 764M 766M
mimalloc-test 370 2399M 764M 766M
mimalloc-test 370/19 2399M 765M+ 768M+
mimalloc-test 370 2399M 770M+ 773M+
mimalloc-test 370/39 2399M 790M+ 793M+
Memory still goes up pretty quick on larger workload though.
To adjust these two options, you may go to src/main.rs and add the option to the tokio runtime initialization:
let rt = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.max_blocking_thread(8)
.thread_keep_alive(Duration::from_millis(1_000))
.build()
.unwrap();
on linux the threads stay constant at 199 (but uses more memory and goes much faster?)
IIRC, mine test on linux gets stable at around 550~MB. I think this is totally fine if it's not increasing.
how can I configure the project to use the standard macOS allocator?
Just comment out these two lines at the top of src/main.rs and recompile.
#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;
Thanks! It is good to see that v3 does much better as it was essentially redesigned to deal with threadpools better. Just to clarify, the threads on Linux stay constant under 200 (and memory growth stabilizes). Btw. in a threadpool like scenario we expect an initial growth of memory as each thread retains some owned memory but it should level out after a while.
Maybe the creation of the new threads that I see on macOS is the root cause of the memory growth (where there as an increasing amount of threads that are never terminated -- maybe due to mimalloc? maybe not?).
Thanks for the reply!
Maybe the creation of the new threads that I see on macOS is the root cause of the memory growth (where there as an increasing amount of threads that are never terminated -- maybe due to mimalloc? maybe not?).
That's weird. If I set max_blocking_thread to 8 with mimalloc-v3, the thread count gets stable at 33 and the memory is still growing over time. I've uploaded a large workload with 20000 modules, but it still does not quite reproduce the memory growth in a large scale even though it's growing quite slowly. There's a big workload in my company project that I'm unable to share:
For each rebuild, it consumed and retained 300 MiB memory.
$ top -stats pid,command,threads,vprvt,rprvt,mem -r | grep -i 5732
5732 node 50 5353M 2117M 2286M
5732 node 52/1 3805M- 1842M- 2296M+
5732 node 52 4583M+ 1985M+ 2296M+
5732 node 52 4583M 1985M 2296M
5732 node 51 4582M- 1894M- 2152M-
5732 node 50/1 4524M- 1857M- 2110M-
5732 node 50 4524M 1857M+ 2110M+
5732 node 50/1 4524M 1857M 2110M+
5732 node 50 4524M 1857M 2110M+
5732 node 50/1 4524M 1857M 2110M+
5732 node 50 4524M 1857M 2110M+
5732 node 50 4524M 1857M+ 2111M+
5732 node 50/1 4524M 1857M 2111M+
5732 node 50 4524M 1857M 2111M+
5732 node 50 4524M 1857M 2111M+
5732 node 51/3 4509M- 1773M- 2067M-
5732 node 51/13 4309M- 1832M+ 2167M+
5732 node 51/2 4688M+ 2168M+ 2498M+
5732 node 51/7 4633M- 2356M+ 2766M+
5732 node 51/2 5171M+ 2460M+ 2833M+
5732 node 51/13 5209M+ 2537M+ 2919M+
5732 node 51/2 5869M+ 2380M- 2773M-
5732 node 51/1 5886M+ 2392M+ 2786M+
5732 node 51/1 5886M 2392M 2786M
5732 node 51/1 5886M 2392M 2786M
5732 node 51/1 5886M 2392M 2786M
5732 node 50/1 5862M- 2364M- 2723M-
5732 node 50 5851M- 2355M- 2712M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ First build finished
5732 node 50 5851M 2355M 2712M+
5732 node 50 5851M 2355M 2712M+
5732 node 50 5851M 2355M 2712M+
5732 node 50/1 5851M 2355M 2712M+
5732 node 50/1 5851M 2355M 2712M+
5732 node 50 5851M 2355M 2712M+
5732 node 50/1 5851M 2355M 2712M+
5732 node 50 5851M 2355M+ 2712M+
5732 node 50/1 5851M 2355M+ 2712M+
5732 node 50 5851M 2355M 2712M+
5732 node 50/1 5851M 2355M 2712M+
5732 node 50 5851M 2355M 2712M+
5732 node 50/1 5851M 2355M 2712M+
5732 node 51/14 5972M+ 2266M- 2666M-
5732 node 51/1 5958M- 2349M+ 2750M+
5732 node 52/3 5624M- 2553M+ 3005M+
5732 node 52/3 6091M+ 2794M+ 3236M+
5732 node 52/2 6124M+ 2787M- 3208M-
5732 node 52/13 6128M+ 2808M+ 3243M+
5732 node 52/1 6295M+ 2723M- 3188M-
5732 node 52/1 6286M- 2663M- 3128M-
5732 node 52 6294M+ 2667M+ 3128M
5732 node 52 6294M 2667M 3128M
5732 node 52 6294M 2667M+ 3129M+
5732 node 50 6239M- 2611M- 3035M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Second build finished
5732 node 50/1 6239M- 2611M+ 3035M+
5732 node 50/1 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3035M
5732 node 50 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3036M+
5732 node 50/1 6239M 2611M 3036M+
5732 node 50/1 6239M 2611M 3035M-
5732 node 50/1 6239M 2611M 3035M+
5732 node 50/1 6239M 2611M 3035M+
5732 node 50 6239M 2611M 3035M+
5732 node 51/5 6194M- 2565M- 3002M-
5732 node 51/22 6036M- 2656M+ 3104M+
5732 node 52/13 6348M+ 2886M+ 3350M+
5732 node 52/2 6288M- 3105M+ 3587M+
5732 node 52/1 6402M+ 3065M- 3545M-
5732 node 52/13 6394M- 3073M+ 3556M+
5732 node 52 6303M- 3013M- 3472M-
5732 node 52/1 6292M- 2946M- 3413M-
5732 node 52 6300M+ 2950M+ 3413M
5732 node 52/1 6300M 2950M 3413M
5732 node 52/1 6300M 2950M 3413M
5732 node 52 6259M- 2902M- 3329M-
5732 node 50 6245M- 2897M- 3322M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Third build finished
5732 node 50 6245M 2897M+ 3322M+
5732 node 50/1 6245M 2897M 3322M+
5732 node 50 6245M 2897M 3322M+
5732 node 50/1 6245M 2897M 3322M+
5732 node 50/1 6245M 2897M 3323M+
5732 node 50/1 6245M 2897M+ 3323M+
5732 node 50 6245M 2897M 3323M+
5732 node 50/1 6245M 2897M 3323M+
5732 node 50 6245M 2897M 3323M+
5732 node 50/1 6245M 2897M 3323M+
5732 node 50/1 6245M 2897M+ 3323M+
5732 node 50 6245M 2897M 3323M+
5732 node 50 6245M 2897M 3323M+
5732 node 50 6245M 2897M 3323M-
5732 node 50/1 6245M 2897M 3323M+
5732 node 53/15 6253M+ 2895M- 3313M-
5732 node 52/3 6356M+ 2909M+ 3372M+
5732 node 52/3 5970M- 3020M+ 3515M+
5732 node 52/2 6039M+ 3249M+ 3754M+
5732 node 52/2 6361M+ 3444M+ 3957M+
5732 node 52/1 6842M+ 3391M- 3957M-
5732 node 52/1 6956M+ 3492M+ 4066M+
5732 node 52/2 6805M- 3228M- 3790M-
5732 node 52/1 6795M- 3234M+ 3797M+
5732 node 52 6803M+ 3234M+ 3797M
5732 node 52 6803M 3235M+ 3798M+
5732 node 52 6803M 3235M 3798M
5732 node 50/6 6780M- 3209M- 3737M-
5732 node 50 6769M- 3202M- 3729M- ⬅️⬅️⬅️⬅️⬅️⬅️⬅️⬅️ Forth build finished
5732 node 50 6769M 3202M 3729M+
5732 node 50 6769M 3202M 3729M+
5732 node 50 6769M 3202M 3729M+
5732 node 50/1 6769M 3202M 3729M+
5732 node 50 6769M 3202M 3729M+
5732 node 50/1 6769M 3202M 3729M+
5732 node 50 6769M 3202M+ 3729M+
5732 node 50/1 6769M 3202M 3729M+
5732 node 50 6769M 3202M+ 3729M+
5732 node 50/1 6769M 3203M+ 3729M+
So... do you know if there's any recommend way to debug this? For example, checking out the thread of which piece of memory was being hold off from releasing. Or do you know if there's anything should be checked out for me locally?
We might be seeing a related issue on the Apache Arrow CI when trying to bump mimalloc from 2.0.6 (which works fine) to either 2.1.9 or 2.2.3 (both of which incur crashes on some of our ARM64 macOS jobs, when running some large-memory tests). Is there a way to check whether it's the same issue?
Also cc @kszucs
Here's the test log by the way, but it's not very insightful... though the "0x2000000" alignment is weird, is it expected?
tests/test_convert_builtin.py::test_array_to_pylist_roundtrip SKIPPED
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0xBF000000 bytes, address: 0x000CD4800000, alignment: 0x2000000, commit: 1)
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0xBF000000 bytes, address: 0x000D95000000, alignment: 0x2000000, commit: 1)
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0xBF000000 bytes, address: 0x000E55000000, alignment: 0x2000000, commit: 1)
tests/test_convert_builtin.py::test_auto_chunking_binary_like PASSED
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0x80800000 bytes, address: 0x0004C4800000, alignment: 0x2000000, commit: 1)
tests/test_convert_builtin.py::test_auto_chunking_list_of_binary PASSED
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0x80800000 bytes, address: 0x0004C4800000, alignment: 0x2000000, commit: 1)
mimalloc: warning: unable to allocate aligned OS memory directly, fall back to over-allocation (size: 0x80800000 bytes, address: 0x007140800000, alignment: 0x2000000, commit: 1)
ci/scripts/python_test.sh: line 73: 38091 Killed: 9 pytest -r s -vs ${PYTEST_ARGS} --pyargs pyarrow
Ok, I've tested several versions now and I've determined that the crash appears with 2.1.9. Versions 2.1.7 and older are fine. Should I open a new issue for it?
Hi @pitrou : can you repro reliably? I think the actual difference between 2.1.7 and 2.1.9 is not that large so maybe you can pinpoint the commit that causes the crash? I will look into the diff to see if I can spot the possible issue. (if you have a way for me to repro that would good too).
btw. the aligned warning is fine -- can especially happen with these large allocations.
btw2: do you only see the crash on arm64 macOS? (not on other platforms?)
Hmm, I don't know how you do the tags on mimalloc, but if I ask git to bisect between v2.1.7 and v2.1.9, it tells me try to a changeset around v1.8.8... I'll try something else.
Ok, it seems the last good commit is 75459a1bd72bf739bc084647e4a0fb20f91e6c9e and the first bad commit is a1cfe9667c307a355c13b6f8381463e04ad6b04d.
Random observation, but I don't understand this code, these two lines look weird: https://github.com/microsoft/mimalloc/commit/a1cfe9667c307a355c13b6f8381463e04ad6b04d#diff-aa9a92fb58ac9e5cb386ab5760eaf8d2e7f81da12aae09c43a9e1d102332ed77R168-R169
I would intuitively expect:
size_t csize = memid.mem.os.size;
if (csize==0) { csize = _mi_os_good_alloc_size(size); }
instead of:
size_t csize = memid.mem.os.size;
if (csize==0) { _mi_os_good_alloc_size(size); }
I'm now trying this patch on top of 2.1.9 on our CI and it seems to fix the issue:
diff --git a/src/os.c b/src/os.c
index 61c9eebf..3441bc3d 100644
--- a/src/os.c
+++ b/src/os.c
@@ -166,7 +166,7 @@ static void mi_os_prim_free(void* addr, size_t size, size_t commit_size) {
void _mi_os_free_ex(void* addr, size_t size, bool still_committed, mi_memid_t memid) {
if (mi_memkind_is_os(memid.memkind)) {
size_t csize = memid.mem.os.size;
- if (csize==0) { _mi_os_good_alloc_size(size); }
+ if (csize==0) { csize = _mi_os_good_alloc_size(size); }
size_t commit_size = (still_committed ? csize : 0);
void* base = addr;
// different base? (due to alignment)
I tried this patch with mimalloc-v3 and put it to a test on an internal project. The problem is still existing. This may not related to the rspack issue. 🤔
@daanx Do you think the patch in https://github.com/microsoft/mimalloc/issues/1025#issuecomment-2768863283 is ok? Or am I just misunderstanding things there?
I can confirm that this is a rather bad bug that can be easily fixed via https://github.com/microsoft/mimalloc/issues/1025#issuecomment-2768863283. This bug hit Git for Windows after I upgraded its vendored-in copy of mimalloc to v2.2.3: As of right now (read: before I add some workaround), I cannot release a 32-bit variant of Git for Windows v2.50.0-rc1...
I missed the earlier comment :-(. But this is fixed now in commit 1c51484 -- however, it was really an artifact of another bug which was fixed in 3e32b4c . Apologies -- I will soon push out a fresh release of mimalloc with this fixed.
Thanks a lot @daanx !
@daanx this is addressed in v2.2.4 via https://github.com/microsoft/mimalloc/commit/30a17bf1b773e57fa79c1c96667bf5163a024c02, right?
In https://github.com/apache/arrow/pull/45979 we were finally able to bump the mimalloc version used for Apache Arrow. Thank you!