mimalloc icon indicating copy to clipboard operation
mimalloc copied to clipboard

Peak RSS regression from 2.0.6 to 2.2.4

Open pitrou opened this issue 5 months ago • 1 comments

In Arrow we recently updated our bundled mimalloc build from 2.0.6 to 2.2.4.

Lately I noticed that reading some Parquet files I have lying around led to a much higher max RSS than on Arrow 20.0.0, so I bisected and found out that the culprit was indeed the mimalloc version bump.

My measurements are done on a 12-core 24-thread machine on Ubuntu 24.04 using the following command (repeated 3 times to ensure the numbers are stable):

$ /usr/bin/time -f "%U user %E elapsed %M kB" python -c "import pyarrow.parquet as pq; pq.read_table('FILENAME')"

Here I present the results for 3 different mimalloc versions, and also for jemalloc and the system (glibc) allocator:

Test file mimalloc 3.1.5 mimalloc 2.2.4 mimalloc 2.0.6 jemalloc system (glibc)
lineitem.parquet 1.44 user 0:00.42 elapsed 669608 kB 1.65 user 0:00.40 elapsed 892512 kB 1.38 user 0:00.44 elapsed 700696 kB 1.39 user 0:00.44 elapsed 663816 kB 1.42 user 0:00.46 elapsed 717120 kB
1000col.parquet 15.48 user 0:03.15 elapsed 2666164 kB 14.79 user 0:03.16 elapsed 3224196 kB 15.74 user 0:03.22 elapsed 2585096 kB 15.85 user 0:03.18 elapsed 2593136 kB 15.92 user 0:03.73 elapsed 2576316 kB
StockUniteLegale_utf8.parquet 24.20 user 0:02.11 elapsed 7064604 kB 24.99 user 0:01.73 elapsed 8934212 kB 23.29 user 0:02.14 elapsed 7002348 kB 23.23 user 0:02.15 elapsed 6950332 kB 23.28 user 0:02.23 elapsed 6734988 kB

You can see that mimalloc 2.2.4 produces much higher peak RSS than all other allocators. Elapsed time is sometimes better than the others, which hints that 2.2.4 might just be setting aside a lot more freed memory.

As an experiment I also measured with multi-threading disabled by setting the environment variable OMP_NUM_THREADS=1 and passing use_threads=False to the read_table function call:

Test file mimalloc 3.1.5 mimalloc 2.2.4 mimalloc 2.0.6 jemalloc system (glibc)
lineitem.parquet 0.94 user 0:01.21 elapsed 641460 kB 0.90 user 0:01.20 elapsed 626100 kB 0.90 user 0:01.21 elapsed 628768 kB 0.92 user 0:01.20 elapsed 629676 kB 0.93 user 0:01.21 elapsed 619656 kB
1000col.parquet 5.41 user 0:05.98 elapsed 1927936 kB 5.47 user 0:06.12 elapsed 1918084 kB 5.52 user 0:06.11 elapsed 1918744 kB 5.47 user 0:06.10 elapsed 1918464 kB 7.55 user 0:08.57 elapsed 2125832 kB
StockUniteLegale_utf8.parquet 14.92 user 0:18.12 elapsed 6990128 kB 14.52 user 0:15.94 elapsed 8899392 kB 14.80 user 0:17.98 elapsed 7101364 kB 14.42 user 0:17.53 elapsed 6809584 kB 14.44 user 0:17.54 elapsed 6848652 kB

With multi-threading (almost) disabled, we see that the peak RSS problem mostly disappears - as I would expect - except on one file StockUniteLegale_utf8.parquet where it is surprisingly still present.

Also, the good news is that mimalloc 3.1.5 looks fine in these measurements. But we would rather wait for it to move to stable status before switching to it.

(note: I've also reported user time, but HW multi-threading makes comparisons unreliable)

pitrou avatar Jul 07 '25 12:07 pitrou

@adamreeve FYI.

pitrou avatar Jul 07 '25 12:07 pitrou