snmalloc icon indicating copy to clipboard operation
snmalloc copied to clipboard

WIP: BatchIt

Open nwf-msr opened this issue 2 years ago • 10 comments

As part of the assessment of #634, but also perhaps more generally useful. Opinions welcome.

nwf-msr avatar Sep 21 '23 03:09 nwf-msr

Looks like it is leaking in some case https://github.com/microsoft/snmalloc/actions/runs/6279469686/job/17055222812?pr=637#step:7:149

mjp41 avatar Sep 24 '23 19:09 mjp41

Looks like it is leaking in some case https://github.com/microsoft/snmalloc/actions/runs/6279469686/job/17055222812?pr=637#step:7:149

Whoops; I had the loop termination conditions wrong. They're fixed now, I think. Let's see if CI agrees.

nwf-msr avatar Sep 25 '23 20:09 nwf-msr

After discussions with @mjp41 yesterday, I've introduced a notion of "tweakable obfuscation" and have made all the intra-slab free lists' backwards signatures use the address of the slab metadata as the "tweak". The next step would be to remove the per-thread keys and have everyone use a common global key (probably not RemoteAllocator::key_global!) and apply the same tweaking. This opens the door to sending threads being able to build up segments of slab free lists that can be spliced in by the recipient in O(1) rather than O(n).

nwf-msr avatar Nov 16 '23 02:11 nwf-msr

I've (at long last) got things flying end to end with a very simple "cache" on the sending side -- a single open ring -- but I think some review and investigation is a good idea. Here's what mimalloc-bench makes of the current state of things in terms of time image and memory image

nwf-msr avatar Dec 14 '23 04:12 nwf-msr

We should:

  1. figure out how to make the caching layer optional, which probably just means some more std::conditional_t use.
  2. get someone with chops to assess the "tweaked obfuscation" changes.
  3. offer to randomize deallocator cache construction order (Matt writes: "as we are building a ring, we can add to the start or the end, so perhaps we could at least build an unpredictable order in the ring")
  4. offer probabilistic premature eviction from the deallocator caches to further thwart attempts to control free-list order

nwf-msr avatar May 23 '24 12:05 nwf-msr

Two things to address:

  • Randomisation - this might break some of the randomisation, can we use the ways to build multiple queues for the same slab.
  • Can we disable this feature as feature flag and constexpr/conditional_t, so we can analyse performance more in the future.

mjp41 avatar Jun 13 '24 13:06 mjp41

Just rebasing after #659 landed. Todo-s remain to be addressed.

nwf-msr avatar Jun 13 '24 21:06 nwf-msr

And the novelty of [[no_unique_address]] continues to sting. Hm.

nwf-msr avatar Jun 13 '24 21:06 nwf-msr

Start addressing to-dos, specifically being able to turn BatchIt off: rewrite history to have always had a RemoteDeallocCacheBatching structure that encapsulates the client-side logic. We can pair this with the current RemoteMessage structure, then add parallel non-batching implementations of the client-side logic and the RemoteMessage internals.

nwf-msr avatar Jun 19 '24 15:06 nwf-msr

OK, well, apparently MAX_CAPACITY_BITS needs to be at least 17 - 4 = 13, were it to be universal: 64-bit cross-builds under qemu use a MIN_CHUNK_SIZE of 17 and a smallest sizeclass of 16 bytes.

But that breaks all the 32-bit builds, because MAX_SMALL_SIZECLASS_BITS is 16 and MIN_OBJECT_COUNT is sometimes 13, and 16 + ceil(log_2(13)) + 13 is 33.

I'm not sure why some other Windows builds are still broken.

nwf-msr avatar Jun 19 '24 20:06 nwf-msr