shadow Service syscalls from the shim

We believe much of the current overhead of the dev branch is due to switching between the plugin and shadow-worker processes to service syscalls. On such switches, either the process gets switched off the physical CPU core (which is expensive), or both processes run concurrently on two physical CPU cores, consuming an extra core per plugin thread.

We could instead link libshadow into the shim, and arrange for shadow's data structures to be in shared memory (by using a custom allocator in Shadow, which is well supported in glibc), to be loaded into a fixed address in every plugin process (in the LD_PRELOADed shim's initialization). In this case the shim could service syscalls itself without having to return control to a separate Shadow thread.

Additionally, when the syscall does block the plugin thread, it could first unblock the next thread scheduled thread, by posting to its semaphore in shared memory, and then block on its own semaphore. That way there'd only be a context switch to the next plugin thread rather than switching back to Shadow only to schedule the next plugin thread.

Sep 18 '20 23:09 sporksmith

BTW this is similar to User Mode Linux (UML)'s "traditional" or "tracing thread" strategy. In that strategy they have a single tracing thread for all UML processes, but the tracing thread just transfers control to the UML code in the UML process on signals and syscalls; we could do the same here if we still use ptrace to catch syscalls that aren't already interposed by the shim.

The UML docs generally cite the more recent "separate kernel address space" (skas) strategies as preferred. These are closer to Shadow's current threadptrace model, where UML/Shadow isn't loaded into the traced process's address space, and the UML/Shadow process services the syscalls. However, I don't think the advantages apply in our case. http://user-mode-linux.sourceforge.net/old/skas.html

It saves virtual address space in the traced process. When UML was being developed they were working with a 32-bit address space. I think this is a non-issue on x86-64.
It prevents the traced process from directly accessing the UML (or in our case Shadow) state loaded into the process. We don't care about this as much in our security model, where we trust the traced processes. While it does mean a wild write in a traced process could corrupt Shadow's state instead of just that process's state, the end result is a spoiled simulation in either case.
UML gets a performance boost by servicing syscalls in the tracing thread instead of having to return control to the traced process. While this extra control transfer would indeed be slow for the "ptrace path", for paths that we've optimized to go through the "preload path" there would be no such control transfers back and forth (which again is the motivating factor for making this change :) )

Sep 19 '20 00:09 sporksmith

Since this issue was created, we've learned more about the performance cost of switching between Shadow and the managed processes. Switching cost is much less significant than we originally thought it would be when:

using 1 shadow worker thread for each managed process
pinning each shadow worker thread and its managed process to the same core
using shared memory with semaphores to signal the switch

Moving more code into the managed process space could still improve performance even with the above optimizations, but there are still a lot of unknowns and potential complexity in moving "most" of Shadow to the managed process space.

Oct 15 '21 10:10 robgjansen

shadow shadow copied to clipboard

Service syscalls from the shim

shadow
shadow copied to clipboard