snmalloc icon indicating copy to clipboard operation
snmalloc copied to clipboard

Release all pagemap reservations at the very end of the program or DLL

Open NeilMonday opened this issue 7 months ago • 6 comments

Continuation of: https://github.com/microsoft/snmalloc/pull/771. I made a new PR since it was a somewhat different concept from the original.

This seems to be a more correct way to cleanup at the end of the program or DLL.

  1. Reservations are tracked in a static vector
  2. This vector is a custom one that use VirtualAlloc and VirtualFree to avoid recursive calls
  3. When it's time to exit, free the reservations we have created
  4. #pragma init_seg(".CRT$XCB") is used to set this translation unit to be very early in the initialization phase. This means that our cleanup function passed to atexit() gets called at the very end, and our cleanup code can work properly.

NeilMonday avatar May 22 '25 19:05 NeilMonday

@mjp41 Sorry to bug you again. What do you think of this change?

Also, I can't see how to re-run the failed job. I wanted to re-run since the equivalent jobs for 2022 and 2025 both seem to pass.

NeilMonday avatar May 23 '25 11:05 NeilMonday

I’m installing VS 2019 to test that 32-bit test failure.

NeilMonday avatar May 27 '25 11:05 NeilMonday

I’m installing VS 2019 to test that 32-bit test failure.

That failing test is a little flaky. It has an unpredictable memory usage. However, it does cause interesting contention that discovered a bug. So I am not entirely happy disabling it, but I am guessing it is just out of memory. I'll run it again.

mjp41 avatar May 27 '25 13:05 mjp41

I’m installing VS 2019 to test that 32-bit test failure.

That failing test is a little flaky. It has an unpredictable memory usage. However, it does cause interesting contention that discovered a bug. So I am not entirely happy disabling it, but I am guessing it is just out of memory. I'll run it again.

Do you know what criteria it uses to determine pass/fail? I can't see where it actually decides this. I got 100% tests passed on 2/2 runs when running on my local machine. I matched the same commands as closely as possible to what the github job did.

mkdir build
cd build
mkdir vs16-x32
cd vs16-x32
cmake ../.. -G "Visual Studio 16 2019" -A Win32   -DSNMALLOC_CI_BUILD=On -DSNMALLOC_RUST_SUPPORT=On
cmake --build .  -- /m /p:Configuration=Debug
ctest -j 2 --interactive-debug-mode 0 --output-on-failure -C Debug --timeout 400


100% tests passed, 0 tests failed out of 70

Total Test time (real) = 500.36 sec

NeilMonday avatar May 27 '25 19:05 NeilMonday

The main criteria for success of that test is not crashing. Unfortunately it can use an unbounded amount of memory. And the usage is dependent on the relative timing of operations. I spent a very long time attempting to make a reliable test that found the bug this test discovered, but failed.

I propose we shrink the number of iterations on 32bit Windows. Then hopefully CI passes.

I think there was a comment you haven't addressed about the sizing relative to os pages. It would be nice for this to be pages sized as the minimum. You could make this different from CMake if you have a reason for it to be different?

mjp41 avatar May 29 '25 18:05 mjp41

Small change. I think the min sizes should be a bit more pervasive. What do you think the following suggestions?

Yes, all of this makes more sense. I'll just double check locally.

NeilMonday avatar Jun 04 '25 12:06 NeilMonday

@mjp41 Do you know what is causing these failures? They don't seem relevant to my change.

NeilMonday avatar Jun 27 '25 19:06 NeilMonday

@mjp41 Do you know what is causing these failures? They don't seem relevant to my change.

I'm investigating today. It is blocking other PRs too.

mjp41 avatar Jun 30 '25 10:06 mjp41