liballocs
liballocs copied to clipboard
Trap2barrier optimisation?
Reading #53 and others I'm reminded that we lack a good solution for writes to address-taken union members. In GC terms these are writes that need to be "barriered". There are other cases of such writes, e.g. in libcrunch we need to write-barrier certain pointer stores, and of course generational GCs do this all the time.
And meanwhile, there is the libsystrap use case, where we take a SIGILL but we really want a trampoline. Traps are the logically clean way to do this, and trampolines or other binary patching are the optimisation.
So maybe one idea is to provide a generic trap2barrier run-time service, either in liballocs or libsystrap. The idea is that user code registers a trap handler, nominally handling some subset of SIGILL or SIGSEGV cases. If sufficiently many handleable traps occur at a given machine instruction, we dynamically trampolinify it a.k.a. barrier it. This could potentially neatly handle many kinds of trap. We still need the uncommon-case path in case we take a trap there that the installed handler can't handle.
There is a converse problem to solve: e.g. if we want to quarantine some memory (thinking temporal safety) so as to generate a trap later, morally we need to do mprotect()
but mode-switching is slow. Can we mitigate this by batching? That's tricky because we need a way to defer the mprotect()
while knowing it's safe to do so. One way to do this would be with Intel memory protection keys. If we assign each page of heap a random key (say), then even with 16 keys we can usually revoke access to just that page plus others that happen to share the key. Of course sometimes we'll get unlucky, from having a near-term need to access the same page or another page of the same key. But most of the time this might let us defer the mprotect long enough to get a meaningful batching benefit. For bonus points we'd harmonise keys across pages shared by the same object, e.g. if pages are initially keyed [1,2,3] but we allocate a big object in the middle of page 1, we'd change the keys to be [1,1,1]. This protection change again would need to be batched, using the same mechanism I guess.
The end result would be simulating byte-granularity protection but in software. It's quite a bootstrappy/self-applicative idea, because we can use the batching/barriering also for cases where our protections land badly and we want to bypass a frequently occurring slow trapping access with a faster soft-barriered one. To do this we'd want to keep privately a secondary mapping of any page affected in this way.