ikos
ikos copied to clipboard
Implement a union of N disjunctive abstract values
We could implement an abstract domain containing a union of at most N disjunctive abstract values. This should be implemented as an option to the analyzer, so that the user can specify N.
This could be used to unroll loops and fix false positives, for instance #39 or #102. This is somewhat similar to #137.
There are still questions on how to implement the abstract operators such as join, meet, widening and narrowing.
I would suggest you think twice about implementing this. This is basically what Frama-C/Value used to do for trace partitioning and loop unrolling, under the name -slevel
. The argument of this option was the number of states to propagate in parallel.
Over the years, we ran into tons of problems caused by the partitioning strategy. It is utterly impossible to predict where the states will be consumed, even in simple looking function. The worst offender is
for (...) { if (c) { ... } }
where c
is not entirely determined by the loop counter. You are going to spend an exponential amount of states to go through the loop, meaning that even very small loops will not get properly unrolled. (And the analysis time will increase dramatically.)
Other problems include, in no particular order:
- the fact that users see
-slevel
as a silver bullet, and specify values that are too high; - the instability of the analysis results: as soon as you change something in your propagation algorithm, the precision of the analysis will be impacted (in both directions, of course);
- the impossibility to "guess" what the analysis will do, because there is no specification on where
-slevel
will get consumed; - you will be adding algorithmic complexity if you try avoid re-propagating states that you already saw once.
My former colleagues at CEA finally managed to implement other partitioning strategies (1) to unroll loops; (2) to propagate disjuncts according to a clear criterion. My understanding is that they are much happier with this approach. Of course, the dataflow iterator is quite a bit more complex, but this is definitely worth it.
Feel free to ask questions here or in private if you're not convinced. I can also put you in contact with my former team, which has first-hand experience with the new engine.
Thanks for your feedback @yakobowski!
I was not aware of all these issues.
Over the years, we ran into tons of problems caused by the partitioning strategy. It is utterly impossible to predict where the states will be consumed, even in simple looking function. The worst offender is
for (...) { if (c) { ... } }
where
c
is not entirely determined by the loop counter. You are going to spend an exponential amount of states to go through the loop, meaning that even very small loops will not get properly unrolled. (And the analysis time will increase dramatically.)
Interesting. Did you see this pattern a lot in real codes?
LLVM has an optimization called loop invariant code motion that can move the if
outside the loop, if it is sound. Maybe we could try that.
* the fact that users see `-slevel` as a silver bullet, and specify values that are too high;
I know a few Frama-C users and they all use -slevel
. It seems to improve the precision a lot. If you are concerned about users using too high values, maybe we could just write this down in the documentation?
* the instability of the analysis results: as soon as you change something in your propagation algorithm, the precision of the analysis will be impacted (in both directions, of course); * the impossibility to "guess" what the analysis will do, because there is no specification on where `-slevel` will get consumed;
Agreed!
* you will be adding algorithmic complexity if you try avoid re-propagating states that you already saw once.
I didn't really think about this. Did you get a big performance win after implementing this optimization?
My former colleagues at CEA finally managed to implement other partitioning strategies (1) to unroll loops; (2) to propagate disjuncts according to a clear criterion. My understanding is that they are much happier with this approach. Of course, the dataflow iterator is quite a bit more complex, but this is definitely worth it.
I'm actually focusing on #137 first.
The main idea is to propagate disjunctions for the different return
statements of a function, to avoid unions between the success states and the failure states (see #136 for instance). I'm still thinking about how to implement this. We could use the return value as the disjunction criteria. This probably works well when returning an integer, but what about other cases?
Is it something you have implemented in Frama-C? I saw the -eva-split-return
parameter. I would like to talk to you about this. We can communicate here, by email or by Skype, as you want.
Thanks.