libcrunch icon indicating copy to clipboard operation
libcrunch copied to clipboard

Instrumentations should be more factored

Open stephenrkell opened this issue 3 years ago • 2 comments

A shortish-term goal is to allow many different instrumentations to be created easily. We already have this to some extent, with the bounds-checking and type-checking parts. It would be good to be able to reproduce many papers' approaches/results.

I am envisaging the following parts.

  • inlinifier (to get control of basic ops)
  • ptrintarith and any similarly generic C-simplifying transformations
  • shadowcrunch (for shadow memory, including shadow stack / 128bitifier)
  • the error-handling behaviour (we already have 'abort' vs 'carry on' vs 'secondary path')
  • loop analyses and check-coalescing transformations, if factorable
  • C++ equivalents of the above? tricky since all the above are CIL-y
  • libc wrappers where necessary
  • other supporting things relevant of course: toolsub, librunt, instroscope
  • link-time checking? becomes useful under the multi-ABI regime

This relates to #4, in that we have to revisit our approach to packaging dependencies more broadly.

A pitch for all this is as a more accessible (simpler), stabler (less churn) and more comprehensive (source-level) research testbench than LLVM.

stephenrkell avatar Sep 27 '22 14:09 stephenrkell

One problem with a CIL inlinifier is that it can't do the site-specific codegen we envisage for stuff like inline caching. E.g. if we declared a static local for cache purposes, it would get scoped to the inline function whereas we want it in the caller (one per call site). This would be easy to do with a macro. I guess doing a CIL pass over the call sites is not too much bother.

stephenrkell avatar Sep 27 '22 15:09 stephenrkell

Another issue is our use of hot/cold path-splitting. It seems hard to make this modular, although it could be done.

One intriguing application of the hot/cold path is for speculative hoisting. Does this even make sense? E.g. given a check inside a loop, ideally we might prove the necessary conditions for hoisting the check out of the loop (into one "big range" check). These are something like:

  • the loop has an identified induction variable, of which the check target is a function
  • the loop always ranges over the entirety of some range of that var, i.e. doesn't exit early, doesn't mutate the induction var in odd places, etc.
  • no (re)allocation happens during the loop, i.e. allocs don't get replaced or resized while the loop is going on

But what if we can't prove those things? Can we do a speculative check. knowing that if it passes we are all good (fast path omits the check inside the loop body) but if it fails, we can fall back on a secondary path that does include the loop inside?

I think this kind of speculation could be good for cases of loop exiting early. It could also be good for cases where the allocation grows inside the loop, i.e. if there's a need to grow, the primary check would fail but the secondary path would be fine.

I can envisage a combined pass that does the hot/cold split and loop hoisting together. Much harder to see a way to separate / parameterise those. I think it would be fine to do the combined pass, though.

stephenrkell avatar Oct 17 '22 11:10 stephenrkell