rellic icon indicating copy to clipboard operation
rellic copied to clipboard

Improve `while` lifting

Open frabert opened this issue 4 years ago • 7 comments

Currently, a loop such as this

while(cond) {
    body
}
rest

is lifted like this:

while(cond) {
  body
}
if(!cond) {
  rest
}

It would be nice to get rid of the redundant if

frabert avatar Oct 19 '21 12:10 frabert

This will likely need some referencing with the original "No More Goto's" paper and the "Helping Johnny Analyze Malware" paper. It might have an interaction with the solver based refinements we do. I would not approach this via simple rewriting rules.

surovic avatar Oct 19 '21 17:10 surovic

Can you elaborate on why rewriting rules would not be a good approach?

frabert avatar Oct 19 '21 17:10 frabert

I would also like this explanation. For example, maybe what you're saying is that there is a place in rellic where we have the context to ask z3 about cond and !cond, whereas in an ast rewrite rule, it requires a structural pattern match, so it would seem to be more natural to do it where the z3 context is. If that's the case, do we have a notion of z3-available rewrite patterns/rules/frameworks -- something that straddles the two domains?

pgoodman avatar Oct 19 '21 18:10 pgoodman

For example, maybe what you're saying is that there is a place in rellic where we have the context to ask z3 about cond and !cond, whereas in an ast rewrite rule, it requires a structural pattern match, so it would seem to be more natural to do it where the z3 context is.

This is exactly the reason. Pattern matching is fragile in the sense that for each pattern, there can be slews of corner cases that will either require bigger or more patterns. If not kept in check this can become unmaintainable pretty fast. And since the core issue is going to be finding the !cond guarded if to a while(cond), Z3 is the way to go. In fact, a very similar structurization pass already exists in CondBasedRefine.(h|cpp).

However I think this is not entirely the same issue. If we remove the if (!cond) guard entirely, we need to place rest in the right place, otherwise execution order might not be preserved. Consider the following case:

while (cond_a) {
    // body
}

if (cond_b) {
    // body
}

if (!cond_a) {
    // rest
}

In this case one needs to investigate the logical relationship between cond_a and cond_b to be safe. Note that this is an example I came up with on the fly. I don't know if this is a common occurrence or not. But for the sake of robustness and generality we should think through how we approach this problem. Or better yet, if there is a solution available in the mentioned papers, we should use it.

If that's the case, do we have a notion of z3-available rewrite patterns/rules/frameworks -- something that straddles the two domains?

No, we currently do not. However, in "Helping Johnny Analyze Malware", there's section 4 and section 5 that deal with logic-based rewriting and control-flow (mainly loop) transformations respectively. I highly recommend reading both of the mentioned papers.

surovic avatar Oct 20 '21 12:10 surovic

Note: I haven't yet read up on all of the inference rules that Rellic already implements to simplify code, but I wanted to share this with you to hear what you think.

After thinking about it, this seems something like a special case for a more general pass, which should act like "if this if condition is guaranteed to be false, we can just keep the contents of the else branch, if it is guaranteed to be true, keep only the contents of the then branch". I don't know of the ability of Z3 to carry out such analyses, but maybe if we have something to analyse the code according to some "Hoare-style" logic, this could be done relatively easily.

frabert avatar Oct 20 '21 14:10 frabert

After thinking about it, this seems something like a special case for a more general pass, which should act like "if this if condition is guaranteed to be false, we can just keep the contents of the else branch, if it is guaranteed to be true, keep only the contents of the then branch".

We actually already do this! Or at least something very close to this in NestedCondProp, NestedScopeCombine and CondBasedRefine.

I don't know of the ability of Z3 to carry out such analyses, but maybe if we have something to analyse the code according to some "Hoare-style" logic, this could be done relatively easily.

So to use Z3 you build up a logical expression (formula) and you ask "is this expression ever true?", or in other words "is this formula satisfiable?". And most questions about code we need answered can be expressed like a question (or query) like that. Z3 also supports a lot of theories to make these questions succinct, so we don't need to think about how to express C in boolean logic. Theories like bitvectors, strings, uninterpreted functions, etc.

We generally use Z3 in two ways. One is for asking questions about relationships between if and while conditions. These have a true or false answer. The second way we use Z3 is to simplify expressions, again mainly conditions. Or really exclusively conditions. This has been hairy since we have to translate C into Z3 expressions, simplify and then translate Z3 expressions back to C. I think sometimes we make the conditions worse actually...

surovic avatar Oct 20 '21 15:10 surovic

Thanks for the context, I'll need to delve deeper to see exactly how it all works

frabert avatar Oct 20 '21 17:10 frabert