kani
kani copied to clipboard
Add Poison to the Kani-GotoC AST
Description of changes:
Right now, because of CBMC, Kani produces a x = nondet()
for a Deinit(x)
.
Progressing on #920, I'm making this small PR to mark these statements using a comment.
Call-outs:
I don't think this is the optimal way of doing it, so I'm happy with any suggestion. Right now, I added a field in one of the Location:: kinds, the one that corresponds to my use case, and then I set it when required, but the way it's done isn't very systematic
Testing:
I'm not sure how I should be testing this inside Kani. I've parsed these with Kanillian and successfully compiled them to the right poison statement instead.
Checklist
- [x] Each commit message has a non-empty body, explaining why the change was made
- [x] Methods or procedures are documented
- [x] Regression or unit tests are included, or existing tests cover the modified code
- [x] My PR is restricted to a single feature or bugfix
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
Why not create a poison expression, that translates to nondet plus comment when you lower to irep?
I'm ok with that, but it means actively taking the decision to get stmt.rs and cbmc out of sync and so I went with a softer solution first But happy to update!
Replacing the Nondet expression with Deinit does raise one question: what happens with the NondetTransformer ?
- I don't understand if and when it's called. If it was, I would expect no Nondet expressions to ever be sent to me by Kani, but it's not the case
- If I add a poison value but want to maintain the invariant that this tranformer currently has, I need to also change every call to poison to a call to some poison function that would have the same behaviour. I will implement that function, but do note that it probably means creating 2 nondet functions for each type that both needs Poison and Nondet
Ok fixed the various comments from @danielsn
That being said, after rereading my PR with a fresh post-weekend mind, I'm a bit unhappy with a Poison expression.
The argument against Poison as an expression
My opinion is that Poison should not be an expression, because it is not a valid value. Values are mathematical entities that result from evaluating an expression. A good definition is given by the UCG WG.
Poison is precisely not a value because it cannot result from an expression evaluation, and it cannot be read from memory. It can only exist in memory. That's why MIR has a deinit statement and not an uninit value that is assigned to a place.
Another good description is given by Robbert Krebbers in the description of the CH2O memory model. Poison bits are part of a value representation in memory, but not part of the value itself.
The argument in favor of Poison as an expression
CBMC has this surprising behaviour of 1) explicitely having paddings as part of the type definitions and 2) explicitely writing nondet in the padding as part of Struct expressions. This conflicts with what I said above for C, and in that sense, CBMC is not C. I'm not sure of the theoretical and practical ramifications of being allowed to produce poison expressions, since the whole point of poison is that it cannot be read in the first place, but it seems that, to work properly in CBMC, you'd need a poison expression. That probably means that you need to add, every read, a check that the produce value isn't poison or something similar
Note that this behaviour of expliciting the value of padding might create discrepancies between C and GotoC. Take the following example
struct S {
u8 x;
[24 bit padding]
u32 y;
}
struct S s = { 0; Poison; 4 }
u8* y = ((u8*) s) + 1;
*y = 2;
y++;
*y = 2;
y++
*y = 3;
Variable s
now contains the equivalent of the u8 array [0, 1, 2, 3, 4]
.
However, if you then write
struct S t = s
then actually, you shouldn't be allowed to read the padding bits and you'll only be writing poison in there. That means, t
should be the equivalent of the "u8" array [0, P, P, P, 4]
The natural semantics for a language that explicits the value of padding would be to simply copy the value of s
in t
and therefore not have the poison in the right place.
Ok fixed the various comments from @danielsn
That being said, after rereading my PR with a fresh post-weekend mind, I'm a bit unhappy with a Poison expression.
The argument against Poison as an expression
My opinion is that Poison should not be an expression, because it is not a valid value. Values are mathematical entities that result from evaluating an expression. A good definition is given by the UCG WG.
Poison is precisely not a value because it cannot result from an expression evaluation, and it cannot be read from memory. It can only exist in memory. That's why MIR has a deinit statement and not an uninit value that is assigned to a place.
Another good description is given by Robbert Krebbers in the description of the CH2O memory model. Poison bits are part of a value representation in memory, but not part of the value itself.
The argument in favor of Poison as an expression
CBMC has this surprising behaviour of 1) explicitely having paddings as part of the type definitions and 2) explicitely writing nondet in the padding as part of Struct expressions. This conflicts with what I said above for C, and in that sense, CBMC is not C. I'm not sure of the theoretical and practical ramifications of being allowed to produce poison expressions, since the whole point of poison is that it cannot be read in the first place, but it seems that, to work properly in CBMC, you'd need a poison expression. That probably means that you need to add, every read, a check that the produce value isn't poison or something similar
I totally agree with your point of view. What if you change poison to be a statement instead of an expression? Poison(place)
would take an address as its only argument.
This matches the DeInit API as well as the initial API suggested here: https://github.com/diffblue/cbmc/issues/7014.
Updated to a deinit statement. This is not ready to be merged for two reasons:
- [x] First I need to sync my Kanillian branch with it to make sure it works as expected
- [x] I need to understand when the NondetTransformer is used and why. Right now, even when the NondetTransformer is used, the final code will contain Nondet expressions because the lowering is done at the very last moment. Is it ok? Is it not?
~I have one broken test in the regression test suite, I'll try to figure out what's wrong, but I'm not sure how to approach that~ That was a local nonsense
I need to understand when the NondetTransformer is used and why. Right now, even when the NondetTransformer is used, the final code will contain Nondet expressions because the lowering is done at the very last moment. Is it ok? Is it not?
NondetTransformer isn't actively used, and was part of an earlier effort to make --gen-c
emit more runnable/parsable C code. If it just rewrites deinit
away, I think it'd be fine (just stick a comment in there). We might remove it later...