l4v icon indicating copy to clipboard operation
l4v copied to clipboard

Enhance C Parser to allow named array bounds

Open Xaphiosis opened this issue 9 months ago • 3 comments

Problem Statement

When trying to generalise proofs across multiple platforms or configurations, we do OK on the abstract/design side, but encounter a significant issue with ArrayBounds during CRefine: they show up as numbers, then get mangled via simp, which makes it very difficult to write a generic proof.

Let's use this C code as an example:

enum {
    numDomains = 16
};

int domainMap[numDomains];

void updateDom(int i, int k) {
    domainMap[i] = k;
}

In the above, we clearly have an enum for numDomains. Normally when used in an expression, unlike preprocessor macros, an enum will show up wrapped in its name:

term numDomains (* "numDomains" :: "32 signed word" *)
thm numDomains_def (* numDomains ≡ 0x10 *)

This is useful for abstracting parameters, because we can bind a preprocessor macro to an enum and as long as it fits in an int, we can refer to it in proofs. However, we don't get to see the enum in the bounds checking. Here's updateDom_body_def:

test_global_addresses.updateDom_body ≡
TRY
  Guard ArrayBounds ⦃´i <s 0x10⦄
   (Guard ArrayBounds ⦃0 ≤s ´i⦄ (´domainMap :== Arrays.update ´domainMap (unat ´i) ´k))
CATCH SKIP
END

Note the Guard ArrayBounds using 0x10. There's knowledge lost between the expression used to define the array and the calculated number. There's no way to link the bound to some more abstract value/condition. Simplification rules can dig in and generate "interesting" numbers like 2147483632, or discover that i < 1 can be simplified to i = 0 making the proof specific to the number, instead of being generic in numDomains, meaning we now need to remove Word.word_less_1 and hope for the best, etc.

General idea and complication

Obviously the C parser needs to calculate the actual size of the array bounds in order to declare its type. No matter what we do, the above domainMap is going to end up a 32 signed word[16] (for 32-bit ints).

For verification and links to other abstraction levels, we need to get hold of the expression used to calculate the size of the array in some manner.

I'm pretty sure that the C parser generates the bounds check expression from the array type rather than its name, so we run into the issue that conceptually one 'a word[16] is not the same as another 'a word[16] when it comes to bounds calculations (which is plausible given that 16 could be anything really). This means we need to interfere with the ArrayBounds generation and make it somehow related to the specific variable name.

Idea 1: Heavy annotation

This was Raf's first idea in this direction. Add an AUX annotation of some kind that we put in the C code to specify what expression we want to use directly, something like:

/** AUXBOUND: "Kernel_Config.numDomains" */
int domainMap[numDomains];

PROS:

  • can use expression from higher level abstraction directly
  • high level of control

CONS:

  • have to annotate C code
  • need to add extra AUX annotation to C Parser
  • C Parser needs to generate proof that our expression is the same as the calculated array size
  • no idea what this would look like for a 2D array
  • expression can't use anything that's generated during the C parse, and Kernel_C doesn't see any abstract/design specs either, meaning duplication like with ctcb_size_bits

Idea 2: Automaic named annotation

After discussion with Raf, Gerwin proposed this: have the C parser generate a constant for that array only, and use that constant for bounds checking. E.g. for the above, something like domainMap_array_size ≡ 0x10. We can then separately create a ..._def' lemma that shows the link to abstract concepts, and unfold with that when we want to be generic.

PROS:

  • no need to annotate C code or addition of parsing extra annotations to C parser
  • no need for C parser to generate proof, as the new definition only wraps the constant

CONS:

  • this will change all bounds checks to need unfolding of these array constants; big intrusion into proof
  • need to resolve name shadowing: global arrays are OK, but local vars become a problem when they shadow global name or another local name
  • what to do with 2D arrays? (none of the other proposed solutions know what to do here)

Idea 3: Keep and throw in entire expression

This is like Idea 2 (and might need to be a transitional stage or prototype to idea 2), but instead of generating the name, we keep the original expression, and whenever we see a bounds check, we dump the entire expression in there. The downside is that instead of 0x10 or whatever, you'll get the whole calculation, even if it's ((1 << 16) - 1) && 0xF or something.

PROS:

  • easiest proof-of-concept prototype for keeping expressions of any kind with array variable names
  • no name shadowing problem, since no names generated

CONS:

  • were we to actually use it in proofs, this would cause problems because term-based calculation will resolve stuff that the simplifier will make a mess of at times, yielding even more bizarre conditions
  • still no idea how to handle 2D arrays

Xaphiosis avatar May 22 '24 02:05 Xaphiosis