mimalloc icon indicating copy to clipboard operation
mimalloc copied to clipboard

Mimalloc v3 memory usage regression

Open Jarred-Sumner opened this issue 10 months ago • 2 comments

We use mimalloc in Bun and locally testing mimalloc dev3 in several of our benchmarks yielded a positive result which is exciting, but a handful of our memory leak stress tests in CI regress. I'm mostly filing this issue so you have another data point for the v3 branch - not asking you to fix a bug or anything. We will stick to v2 for now but will give v3 another go in the future.

Logs from this test:

  | 2025-02-16 02:22:19 PST | Start RSS 2704
  | 2025-02-16 02:22:19 PST | RSS Growth 227
  | 2025-02-16 02:22:19 PST | RSS Growth 198
  | 2025-02-16 02:22:19 PST | RSS Growth 203
  | 2025-02-16 02:22:19 PST | RSS Growth 275
  | 2025-02-16 02:22:19 PST | RSS Growth 91
  | 2025-02-16 02:22:19 PST | RSS Growth 150
  | 2025-02-16 02:22:19 PST | RSS Growth 283
  | 2025-02-16 02:22:19 PST | RSS Growth 151
  | 2025-02-16 02:22:19 PST | RSS Growth 246
  | 2025-02-16 02:22:19 PST | RSS Growth 66
  | 2025-02-16 02:22:19 PST | RSS Growth 262
  | 2025-02-16 02:22:19 PST |  
  | 2025-02-16 02:22:19 PST | 156 \|             lastRSS = rss;
  | 2025-02-16 02:22:19 PST | 157 \|           }
  | 2025-02-16 02:22:19 PST | 158 \|           Bun.gc(true);
  | 2025-02-16 02:22:19 PST | 159 \|
  | 2025-02-16 02:22:19 PST | 160 \|           const rss = (process.memoryUsage.rss() / 1024 / 1024) \| 0;
  | 2025-02-16 02:22:19 PST | 161 \|           expect(rss).toBeLessThan(4092);
  | 2025-02-16 02:22:19 PST | ^
  | 2025-02-16 02:22:19 PST | error: expect(received).toBeLessThan(expected)
  | 2025-02-16 02:22:19 PST |  
  | 2025-02-16 02:22:19 PST | Expected: < 4092
  | 2025-02-16 02:22:19 PST | Received: 4905
  | 2025-02-16 02:22:19 PST |  
  | 2025-02-16 02:22:19 PST | at <anonymous> (/var/lib/buildkite-agent/builds/ip-172-31-3-37/bun/bun/test/js/bun/http/bun-serve-static.test.ts:161:23)
  | 2025-02-16 02:22:19 PST | ✗ static > /big > stress (access .body) > arrayBuffer [5291.13ms]
  | 2025-02-16 02:22:19 PST | Start RSS 7207
[OOM killer kicks in]

https://buildkite.com/bun/bun/builds/11698#01950e27-d67a-40a6-b4ec-e381cd978a17

This particular test - bun-serve-static.test.ts makes many HTTP requests that respond with a large amount of data, repeatedly appending to a buffer internally (the .body getter forces it to stream internally). HTTP requests are processed in a dedicated thread with it's own mimalloc heap. The main thread receives chunks of the HTTP response from the HTTP thread and appends to an internal buffer (which uses mi_malloc instead of the heap-specific methods).

Note that the calls to Bun.gc() internally call mi_collect(0)

 test.each(["arrayBuffer", "blob", "bytes", "text"])(
        "%s",
        async method => {
          const byteSize = static_responses[path][method]?.size;

          const bytes = method === "blob" ? static_responses[path] : await static_responses[path][method]();

          // macOS limits backlog to 128.
          // When we do the big request, reduce number of connections but increase number of iterations
          const batchSize = Math.ceil((byteSize > 1024 * 1024 ? 48 : 64) / (isWindows ? 8 : 1));
          const iterations = Math.ceil((byteSize > 1024 * 1024 ? 10 : 12) / (isWindows ? 8 : 1));

          async function iterate() {
            let array = new Array(batchSize);
            const route = `${server.url}${path.substring(1)}`;
            for (let i = 0; i < batchSize; i++) {
              array[i] = fetch(route)
                .then(res => {
                  expect(res.status).toBe(200);
                  expect(res.url).toBe(route);
                  if (label === "access .body") {
                    res.body;
                  }
                  return res[method]();
                })
                .then(output => {
                  expect(output).toStrictEqual(bytes);
                });
            }

            await Promise.all(array);

            Bun.gc();
          }

          for (let i = 0; i < iterations; i++) {
            await iterate();
          }

          Bun.gc(true);
          const baseline = (process.memoryUsage.rss() / 1024 / 1024) | 0;
          let lastRSS = baseline;
          console.log("Start RSS", baseline);
          for (let i = 0; i < iterations; i++) {
            await iterate();
            const rss = (process.memoryUsage.rss() / 1024 / 1024) | 0;
            if (lastRSS + 50 < rss) {
              console.log("RSS Growth", rss - lastRSS);
            }
            lastRSS = rss;
          }
          Bun.gc(true);

          const rss = (process.memoryUsage.rss() / 1024 / 1024) | 0;
          expect(rss).toBeLessThan(4092);
          const delta = rss - baseline;
          console.log("Final RSS", rss);
          console.log("Delta RSS", delta);
        },
        40 * 1000,
      );
    });

Jarred-Sumner avatar Feb 17 '25 06:02 Jarred-Sumner

Thanks -- and nice project! This is interesting as I believe that v3 should always use less memory than v2. Is this the latest dev3 (as there have been a bunch of important bug fixes) ?

Does this happen too if you call mi_collect(true) just before getting the rss ? (Also, if you can give me a way to repro locally without too much hassle that would be give me way to test further -- but no worries if this is not so easy)

daanx avatar Feb 17 '25 22:02 daanx

@daanx we tried upgrading to mimalloc v3.1.5 and it's so close to working but it segfaults in a private repo after about 10 minutes when freeing a ~80 MB allocation after several gigabytes of allocations have occurred and at least 9,000 short-lived mi_heap_t's are created and destroyed on the same thread.

mi_check_owned(mi_malloc(size))) returns false for the 80 MB allocation that crashes the process on free. For context, the 80 MB allocation is a UTF16 JSON string read from disk converted from UTF8.

I can't say for certain it isn't heap corruption, but it does not crash when libc malloc + ASAN is used. When MI_TRACK_ASAN + mimalloc is enabled, ASAN crashes with a use-after-poison.

There's also a memory usage regression when using multiple long-lived heaps on the same thread, where it seems like memory isn't getting cleaned up as quickly within those non-default heaps. This led to some tests in our test suite OOMing. We were able to work around most of those by not using as many heaps.

On Windows it caused a few tests to fail with OOM errors (but in JavaScript), despite peak RSS measuring lower. is it possible that there are more edgecases with retrying VirtualAlloc?

It feels like there might be a thread safety issue when memory from different heaps are reused across threads. But I haven't been able to come up with a minimal reproduction.

Jarred-Sumner avatar Aug 07 '25 21:08 Jarred-Sumner