Peak RSS regression from 2.0.6 to 2.2.4
In Arrow we recently updated our bundled mimalloc build from 2.0.6 to 2.2.4.
Lately I noticed that reading some Parquet files I have lying around led to a much higher max RSS than on Arrow 20.0.0, so I bisected and found out that the culprit was indeed the mimalloc version bump.
My measurements are done on a 12-core 24-thread machine on Ubuntu 24.04 using the following command (repeated 3 times to ensure the numbers are stable):
$ /usr/bin/time -f "%U user %E elapsed %M kB" python -c "import pyarrow.parquet as pq; pq.read_table('FILENAME')"
Here I present the results for 3 different mimalloc versions, and also for jemalloc and the system (glibc) allocator:
| Test file | mimalloc 3.1.5 | mimalloc 2.2.4 | mimalloc 2.0.6 | jemalloc | system (glibc) |
|---|---|---|---|---|---|
| lineitem.parquet | 1.44 user 0:00.42 elapsed 669608 kB | 1.65 user 0:00.40 elapsed 892512 kB | 1.38 user 0:00.44 elapsed 700696 kB | 1.39 user 0:00.44 elapsed 663816 kB | 1.42 user 0:00.46 elapsed 717120 kB |
| 1000col.parquet | 15.48 user 0:03.15 elapsed 2666164 kB | 14.79 user 0:03.16 elapsed 3224196 kB | 15.74 user 0:03.22 elapsed 2585096 kB | 15.85 user 0:03.18 elapsed 2593136 kB | 15.92 user 0:03.73 elapsed 2576316 kB |
| StockUniteLegale_utf8.parquet | 24.20 user 0:02.11 elapsed 7064604 kB | 24.99 user 0:01.73 elapsed 8934212 kB | 23.29 user 0:02.14 elapsed 7002348 kB | 23.23 user 0:02.15 elapsed 6950332 kB | 23.28 user 0:02.23 elapsed 6734988 kB |
You can see that mimalloc 2.2.4 produces much higher peak RSS than all other allocators. Elapsed time is sometimes better than the others, which hints that 2.2.4 might just be setting aside a lot more freed memory.
As an experiment I also measured with multi-threading disabled by setting the environment variable OMP_NUM_THREADS=1 and passing use_threads=False to the read_table function call:
| Test file | mimalloc 3.1.5 | mimalloc 2.2.4 | mimalloc 2.0.6 | jemalloc | system (glibc) |
|---|---|---|---|---|---|
| lineitem.parquet | 0.94 user 0:01.21 elapsed 641460 kB | 0.90 user 0:01.20 elapsed 626100 kB | 0.90 user 0:01.21 elapsed 628768 kB | 0.92 user 0:01.20 elapsed 629676 kB | 0.93 user 0:01.21 elapsed 619656 kB |
| 1000col.parquet | 5.41 user 0:05.98 elapsed 1927936 kB | 5.47 user 0:06.12 elapsed 1918084 kB | 5.52 user 0:06.11 elapsed 1918744 kB | 5.47 user 0:06.10 elapsed 1918464 kB | 7.55 user 0:08.57 elapsed 2125832 kB |
| StockUniteLegale_utf8.parquet | 14.92 user 0:18.12 elapsed 6990128 kB | 14.52 user 0:15.94 elapsed 8899392 kB | 14.80 user 0:17.98 elapsed 7101364 kB | 14.42 user 0:17.53 elapsed 6809584 kB | 14.44 user 0:17.54 elapsed 6848652 kB |
With multi-threading (almost) disabled, we see that the peak RSS problem mostly disappears - as I would expect - except on one file StockUniteLegale_utf8.parquet where it is surprisingly still present.
Also, the good news is that mimalloc 3.1.5 looks fine in these measurements. But we would rather wait for it to move to stable status before switching to it.
(note: I've also reported user time, but HW multi-threading makes comparisons unreliable)
@adamreeve FYI.