snmalloc
snmalloc copied to clipboard
Use the custom memcpy for realloc.
This wraps our memcpy with some assumptions that let the optimiser know that we're copying chunks that are strongly aligned. With clang 13 on x86, this generates three variants:
- A special case for 16 bytes that's a single vector load + store.
- A vector-copy loop for sizes <512 bytes.
- rep movsb for larger sizes.
This is almost certainly faster than the platform memcpy (if for no other reason than that it doesn't have to care about handling unaligned copies).
I don't know if it will show up in benchmarks, but if it does then it Fixes #154
@nwf, the memcpy is currently incorrect for CHERI. It's probably worth tweaking the default one for any platform where the AAL says that it's CHERI and a specialising it for Morello...
So running some of my usual benchmarks does not show any statistically significant difference. I am still happy to take this. I wonder if it makes sense to have a micro-benchmark to test a collection of reallocs to see if this is a win in an artificial scenario.
If it isn't a bottleneck then it's probably not worth the code size increase. It's small but a lot of small things add up.
If it isn't a bottleneck then it's probably not worth the code size increase. It's small but a lot of small things add up.
I'll close, we can always revisit.