kani icon indicating copy to clipboard operation
kani copied to clipboard

Add Poison to the Kani-GotoC AST

Open giltho opened this issue 2 years ago • 6 comments

Description of changes:

Right now, because of CBMC, Kani produces a x = nondet() for a Deinit(x). Progressing on #920, I'm making this small PR to mark these statements using a comment.

Call-outs:

I don't think this is the optimal way of doing it, so I'm happy with any suggestion. Right now, I added a field in one of the Location:: kinds, the one that corresponds to my use case, and then I set it when required, but the way it's done isn't very systematic

Testing:

I'm not sure how I should be testing this inside Kani. I've parsed these with Kanillian and successfully compiled them to the right poison statement instead.

Checklist

  • [x] Each commit message has a non-empty body, explaining why the change was made
  • [x] Methods or procedures are documented
  • [x] Regression or unit tests are included, or existing tests cover the modified code
  • [x] My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

giltho avatar Aug 06 '22 01:08 giltho

Why not create a poison expression, that translates to nondet plus comment when you lower to irep?

danielsn avatar Aug 06 '22 02:08 danielsn

I'm ok with that, but it means actively taking the decision to get stmt.rs and cbmc out of sync and so I went with a softer solution first But happy to update!

giltho avatar Aug 06 '22 14:08 giltho

Replacing the Nondet expression with Deinit does raise one question: what happens with the NondetTransformer ?

  1. I don't understand if and when it's called. If it was, I would expect no Nondet expressions to ever be sent to me by Kani, but it's not the case
  2. If I add a poison value but want to maintain the invariant that this tranformer currently has, I need to also change every call to poison to a call to some poison function that would have the same behaviour. I will implement that function, but do note that it probably means creating 2 nondet functions for each type that both needs Poison and Nondet

giltho avatar Aug 07 '22 02:08 giltho

Ok fixed the various comments from @danielsn

That being said, after rereading my PR with a fresh post-weekend mind, I'm a bit unhappy with a Poison expression.

The argument against Poison as an expression

My opinion is that Poison should not be an expression, because it is not a valid value. Values are mathematical entities that result from evaluating an expression. A good definition is given by the UCG WG.

Poison is precisely not a value because it cannot result from an expression evaluation, and it cannot be read from memory. It can only exist in memory. That's why MIR has a deinit statement and not an uninit value that is assigned to a place.

Another good description is given by Robbert Krebbers in the description of the CH2O memory model. Poison bits are part of a value representation in memory, but not part of the value itself.

The argument in favor of Poison as an expression

CBMC has this surprising behaviour of 1) explicitely having paddings as part of the type definitions and 2) explicitely writing nondet in the padding as part of Struct expressions. This conflicts with what I said above for C, and in that sense, CBMC is not C. I'm not sure of the theoretical and practical ramifications of being allowed to produce poison expressions, since the whole point of poison is that it cannot be read in the first place, but it seems that, to work properly in CBMC, you'd need a poison expression. That probably means that you need to add, every read, a check that the produce value isn't poison or something similar

giltho avatar Aug 08 '22 15:08 giltho

Note that this behaviour of expliciting the value of padding might create discrepancies between C and GotoC. Take the following example

struct S {
   u8 x;
   [24 bit padding]
   u32 y;
}
struct S s = { 0; Poison; 4 }
u8* y = ((u8*) s) + 1;
*y = 2;
y++;
*y = 2;
y++
*y = 3;

Variable s now contains the equivalent of the u8 array [0, 1, 2, 3, 4].

However, if you then write

struct S t = s

then actually, you shouldn't be allowed to read the padding bits and you'll only be writing poison in there. That means, t should be the equivalent of the "u8" array [0, P, P, P, 4]

The natural semantics for a language that explicits the value of padding would be to simply copy the value of s in t and therefore not have the poison in the right place.

giltho avatar Aug 08 '22 16:08 giltho

Ok fixed the various comments from @danielsn

That being said, after rereading my PR with a fresh post-weekend mind, I'm a bit unhappy with a Poison expression.

The argument against Poison as an expression

My opinion is that Poison should not be an expression, because it is not a valid value. Values are mathematical entities that result from evaluating an expression. A good definition is given by the UCG WG.

Poison is precisely not a value because it cannot result from an expression evaluation, and it cannot be read from memory. It can only exist in memory. That's why MIR has a deinit statement and not an uninit value that is assigned to a place.

Another good description is given by Robbert Krebbers in the description of the CH2O memory model. Poison bits are part of a value representation in memory, but not part of the value itself.

The argument in favor of Poison as an expression

CBMC has this surprising behaviour of 1) explicitely having paddings as part of the type definitions and 2) explicitely writing nondet in the padding as part of Struct expressions. This conflicts with what I said above for C, and in that sense, CBMC is not C. I'm not sure of the theoretical and practical ramifications of being allowed to produce poison expressions, since the whole point of poison is that it cannot be read in the first place, but it seems that, to work properly in CBMC, you'd need a poison expression. That probably means that you need to add, every read, a check that the produce value isn't poison or something similar

I totally agree with your point of view. What if you change poison to be a statement instead of an expression? Poison(place) would take an address as its only argument.

This matches the DeInit API as well as the initial API suggested here: https://github.com/diffblue/cbmc/issues/7014.

celinval avatar Aug 09 '22 23:08 celinval

Updated to a deinit statement. This is not ready to be merged for two reasons:

  • [x] First I need to sync my Kanillian branch with it to make sure it works as expected
  • [x] I need to understand when the NondetTransformer is used and why. Right now, even when the NondetTransformer is used, the final code will contain Nondet expressions because the lowering is done at the very last moment. Is it ok? Is it not?

giltho avatar Aug 17 '22 16:08 giltho

~I have one broken test in the regression test suite, I'll try to figure out what's wrong, but I'm not sure how to approach that~ That was a local nonsense

giltho avatar Aug 17 '22 18:08 giltho

I need to understand when the NondetTransformer is used and why. Right now, even when the NondetTransformer is used, the final code will contain Nondet expressions because the lowering is done at the very last moment. Is it ok? Is it not?

NondetTransformer isn't actively used, and was part of an earlier effort to make --gen-c emit more runnable/parsable C code. If it just rewrites deinit away, I think it'd be fine (just stick a comment in there). We might remove it later...

tedinski avatar Aug 18 '22 17:08 tedinski