corrode
corrode copied to clipboard
sequence points
In C, assignment expressions (including pre and post increment/decrement) are supposed to be grouped into sequence points and evaluated together. I haven't looked up the details yet, so I'm probably doing that wrong.
There may be an opportunity to report a useful diagnostic if we find multiple writes, or mixed reads and writes, to the same variable within one sequence point. I could imagine that rejecting programs with those errors might make the translation easier, in which case we should absolutely do that.
It looks like it's a bit more complicated than that: http://publications.gbdirect.co.uk/c_book/chapter8/sequence_points.html
If I'm reading it correctly, the sequence points are used by the compiler to figure out when it's safe to reorder expressions.
You're right, detecting undefined behavior related to sequence points is hard and Corrode shouldn't do it. But we indeed can generate more readable code by relying on the definition of sequence points. Here are some notes I've been taking on this.
Using C undefined behavior to produce readable Rust
Corrode overly constrains evaluation order. For example:
p->key = p->data = 0;
currently translates to:
(*p).key = {
(*p).data = 0i32 as (*mut ::std::os::raw::c_void);
(*p).data
} as (*mut u8);
but would be preferably written as:
(*p).data = 0i32 as (*mut ::std::os::raw::c_void);
(*p).key = (*p).data as (*mut u8);
Another example:
*op++ = *ip++;
currently translates to:
*{
let _old = op;
op = op.offset(1isize);
_old
} = *{
let _old = ip;
ip = ip.offset(1isize);
_old
};
but would be better written as:
*op = *ip;
op = op.offset(1isize);
ip = ip.offset(1isize);
In general, it would not be safe to reorder assignments in this way except in certain cases where you can prove that the order doesn't change the computed result.
However, C does not specify evaluation order except that "sequence points" act as barriers which reads and writes can't move past. And if you write a C expression that could produce different results depending on evaluation order, the C specification says that the behavior is undefined.
So we don't have to prove anything about the C source we're translating to know that it's safe to apply re-orderings like this! If it wasn't safe, the behavior of the C program was undefined anyway so we can pick an ordering that is convenient.
People sometimes ask if Corrode can be used as an undefined-behavior checker. Regarding the sequence point rules specifically, Corrode cannot generally determine whether an expression has undefined behavior without doing a precise inter-procedural alias analysis, which is pretty far outside the scope of Corrode's job. Corrode might turn out to be good at statically detecting some kind of undefined behavior someday, but the sequence point rules aren't it.
Implementation strategy
Translating a C expression should produce not only a Rust expression, but also some sequence of side-effecting statements before and some statements after the expression.
When translating an operator that does not create a sequence point, identify the side effects of each operand and the side effects of this operator. Group the before-effects together (in any order, but left-to-right is probably least surprising) and similarly for the after-effects, and return an expression formed from applying the current operator to the operands' expressions.
At a sequence point where the result of the expression is discarded (such as before a comma-operator, or at the end of a statement), concatenate the before-effects, then a statement for the result of the expression, and then the after-effects.
At a sequence point where the result of the expression is used (such as in the arguments to a function call), first emit the before-effects. If there are after-effects, then emit a let-binding for the expression, followed by the after-effects; use the let-bound variable as the result of the expression. Otherwise, use the translated expression itself as its result, to minimize the number of temporary names introduced.
Most operators don't, themselves, introduce side effects. Only assignment and pre-increment/pre-decrement introduce before-effects. Only post-increment and post-decrement introduce after-effects. Operators that establish sequence points move all their operands' effects into before-effects.
I've pushed a proof of concept implementation as branch wip-sequence-points
. It doesn't quite work as-is; it sometimes discards code that has side effects, thereby changing the behavior of the program. A better implementation might use a Writer monad to enforce that side-effect statements are combined correctly and are always extracted as needed.