snmalloc Use the custom memcpy for realloc.

Use the custom memcpy for realloc.

Open davidchisnall opened this issue 3 years ago • 3 comments

This wraps our memcpy with some assumptions that let the optimiser know that we're copying chunks that are strongly aligned. With clang 13 on x86, this generates three variants:

A special case for 16 bytes that's a single vector load + store.
A vector-copy loop for sizes <512 bytes.
rep movsb for larger sizes.

This is almost certainly faster than the platform memcpy (if for no other reason than that it doesn't have to care about handling unaligned copies).

I don't know if it will show up in benchmarks, but if it does then it Fixes #154

Mar 16 '22 14:03 davidchisnall

@nwf, the memcpy is currently incorrect for CHERI. It's probably worth tweaking the default one for any platform where the AAL says that it's CHERI and a specialising it for Morello...

Mar 16 '22 14:03 davidchisnall

So running some of my usual benchmarks does not show any statistically significant difference. I am still happy to take this. I wonder if it makes sense to have a micro-benchmark to test a collection of reallocs to see if this is a win in an artificial scenario.

Mar 16 '22 19:03 mjp41

If it isn't a bottleneck then it's probably not worth the code size increase. It's small but a lot of small things add up.

Mar 17 '22 09:03 davidchisnall

If it isn't a bottleneck then it's probably not worth the code size increase. It's small but a lot of small things add up.

I'll close, we can always revisit.

Mar 23 '23 14:03 mjp41

snmalloc snmalloc copied to clipboard

Use the custom memcpy for realloc.

snmalloc
snmalloc copied to clipboard