FStar icon indicating copy to clipboard operation
FStar copied to clipboard

C data types (structs, unions, arrays) for Steel

Open john-ml opened this issue 4 years ago • 1 comments

This PR adds:

  • Steel.C.Connection: PCM morphisms and PCM connections, which are a generic way of relating a PCM for a part of a structure to a PCM for the whole structure.
  • Steel.C.Ref: definitions for fundamental operations (read, write, split, gather) on references that additionally support an operation for "focusing" on a part of a whole via a PCM connection.
  • Steel.C.Reference: a model of non-nullable C pointers for Steel, built atop Steel.C.Ref.
  • Steel.C.Struct, Steel.C.StructLiteral: a model of C structs for Steel, culminating in the definition of combinator addr_of_struct_field for taking a pointer to a field of a struct along with its inverse unaddr_of_struct_field
  • Steel.C.Union, Steel.C.UnionLiteral: a model of C unions for Steel, culminating in the definition of combinator addr_of_union_field for taking a pointer to a field of a union along with its inverse unaddr_of_union_field
  • Steel.C.Array: a model of C arrays for Steel. The interface is minimal but supports common array operations, and @tahina-pro is currently working on the model.

Along the way, it adds the following to ulib:

  • FStar.FSet: a library for finite sets.

It also defines the following helper modules:

  • Steel.C.PCM: an alternative formulation of PCMs where the composability predicate is expressed as a function returning GTot bool, allowing for values of type (pcm a) to remain in universe zero, along with adapter functions between these PCMs and the ones in FStar.PCM.
  • Steel.C.Opt, Steel.C.Frac, Steel.C.Uninit: PCMs for representing exclusive ownership, fractional permissions, and possibly-uninitialized data respectively.
  • Steel.C.Universe: combinators for raising and lowering the universe of PCMs.
  • Steel.C.Typenat, Steel.C.Typestring: an encoding of strings and natural numbers as F* types, allowing such values to be used in types without fear of erasure (for an explanation of why we need this, see Steel.C.Typestring.fsti).
  • Steel.C.Fields: an encoding of struct and union definitions as lists of (field name, typedef) pairs.
  • Steel.C.StdInt: a model of bounded integers, used to describe the size of arrays.
  • Steel.C.Ptr: a model of nullable C pointers for Steel.

Finally, it implements extraction for code using Steel.C.References to idiomatic C. examples/steel/arraystructs contains three examples:

  • PointStruct.fst defines a struct named point containing two u32s and extracts functions that swap a point's x and y coordinates.
  • ScalarUnion.fst demonstrates extraction of code that manipulates a union of two scalar values.
  • HaclExample.fst demonstrates extraction of code that manipulates a struct resembling the kinds of data structures used in the HaclStar project.

john-ml avatar Aug 27 '21 13:08 john-ml

We finally got rid of the base object type: now, a pointer type no longer needs to carry the type of the base allocation unit object from which it derives. We have long stumbled upon this problem, where we didn't know how to hide that type: storing it in the pointer itself would have raised its universe.

But finally, I managed to solve the issue:

  1. I allocate a ghost reference g to store the base type (say a). I allocate that ghost reference following the read-only PCM, so that I can indefinitely duplicate its vprop asserting ownership, without tracking it with any fractional permissions. While g stores values in universe 1, g itself is in universe 0.
  2. So, I initially duplicate that ownership share, and I use one of the two shares to create an invariant (see https://www.fstar-lang.org/papers/steelcore/, Sections 4.3 and 4.4; and https://www.fstar-lang.org/papers/steel/, Section 5.1), from which I get a token i in universe 0, and a pure prop, i >--> pts_to g a, saying that i witnesses the invariant that g points to a.
  3. That pure prop implies exists (t: Type0) . i >--> pts_to g t, from which I can derive a ghost function, get_base_type g i, to compute t thanks to indefinite description. Since this function computes a type, not a value, it is fine to turn it into a pure function.
  4. Indefinite description alone is not enough to prove that get_base_type g i == a. However, I still have the other share on g in hand. So, I can temporarily open the invariant to run a non-observable, proof-only, ghost code block where I have both that share pts_to g a and the share covered by the invariant, pts_to g (get_base_type g i), so by virtue of being governed by the read-only PCM, I can equate the values g points to.
  5. From there, I can use this equality to initialize the base PCM of the reference and the connection, so that a is no longer mentioned in the base type position, replaced with get_base_type g i. The equality get_base_type g i == a can serve as a pure predicate to characterize a freeable pointer.

tahina-pro avatar Sep 03 '22 04:09 tahina-pro

Problem: invariant trick no longer supported

We finally got rid of the base object type: now, a pointer type no longer needs to carry the type of the base allocation unit object from which it derives. We have long stumbled upon this problem, where we didn't know how to hide that type: storing it in the pointer itself would have raised its universe. [...] invariant (see https://www.fstar-lang.org/papers/steelcore/, Sections 4.3 and 4.4; and https://www.fstar-lang.org/papers/steel/, Section 5.1), from which I get a token i in universe 0, and a pure prop, i >--> pts_to g a, saying that i witnesses the invariant that g points to a. 3. That pure prop implies exists (t: Type0) . i >--> pts_to g t, from which I can derive a ghost function, get_base_type g i, to compute t thanks to indefinite description. Since this function computes a type, not a value, it is fine to turn it into a pure function.

Alas, since #2817, this "invariant" trick no longer works, since the invariant token i can no longer be erased and with_invariant can no longer be SteelGhost.

Solution: a new library, Steel.ST.C.Types, with pointer equivalence and relationships as vprop instead of prop

Thus, I had to revisit the hiding of the base type, with help from @nikswamy and @mtzguido: now, a pointer is modeled as a reference to a record value that contains the base type, the base pcm, and the connection-based reference. Thus, we lose prop pointer equality, replaced with a vprop to express pointer equivalence, ref_equiv. Similarly, relationships between pointers (e.g. a pointer is a pointer to a field of a struct value pointed to by another pointer) are now expressed as vprop instead of prop. In fact, every time a pointer shifting operation (taking a struct/union field pointer, an array cell pointer, etc.), internally, a new reference is allocated, pointing to the appropriately ref_focused connection-based reference. Thus, those vprop describing pointer equivalence and relationships are indefinitely duplicable, since the Steel permissions on those references are not tracked. Then, we provide SteelGhost lemmas to rewrite pointer equivalences into pts_to and relationships.

Values and carrier values

Our library provides a sanitized view of PCMs and their carrier values. While we hide the full-blown binary operator of the PCM, we support uninitialized values, fractional permissions (extensible from struct fields to whole structs, etc.) and absence of knowledge (unknown, corresponding to the unit of the PCM.) We provide a generic allocation function for all types, yielding a pointer to uninitialized data in case of success (allocation may fail and return a null pointer.)

Our new pts_to still operates on PCM carrier values, as before. However, instead of asking the user to provide a view type for extraction to C, we manage to extract the view type to its C contents only. More precisely, for C values of a given scalar type t, we represent it as scalar_t t, the PCM carrier types for fractional permissions with uninitialized values, which we extract as t. Thus, we only allow reading through a pointer to a scalar value if it points to an already initialized value. More generally, PCM combinators follow the same structure as their correspoding PCM carrier type combinators, so it is easy to extract those types as C types without the overhead support for permissions and uninitialized values.

Support for recursive data structures

We support void* by also storing the "destination" type and its PCM. This now enables support for recursive data structures such as linked lists. However, void* makes generated C code less readable. Thus, we provide Steel.ST.C.Types.UserStruct, a way to treat a F* record type as a C struct definition. So, contrary to other struct and union types, F* directly extracts the record type to C following the existing Low* extraction; as described above, the types of each field of such a F* record type still need to be PCM carrier types, but those carrier types are extracted to their C contents only. The metadata to treat F* record types as SteelC structs are not extracted. You can compare examples LList (with void*) and LList2 (with F* record types.)

The new library and examples

The library is now in Steel.ST.C.Types, and examples are HaclExample2, HaclExample3, PointStruct2, ScalarUnion2, LList and LList2.

Most of the library has been migrated to Steel.ST, but they can still be used in Steel by loading Steel.ST.Coercions. Now, the library is split into 4 parts:

  • Steel.ST.C.Model.*, the connection-based reference model and PCM instances that John Li worked on during his internship. Migrated to Steel.ST. Model only, do not extract.
  • Steel.C.Model.* : views for selector-based reasoning for those references. Model only, do not extract.
  • Steel.C.*, SteelC types for extraction as of John Li's internship. Supports Steel selectors. However, arrays can only store scalar values. Not everything verifies, due to unknown regressions in between.
  • Steel.ST.C.Types.*: the new user-facing libraries for C types, with the base type hidden appropriately. Now support arbitrary nesting of structs, unions and arrays. Points-to style only, can be used in both Steel.ST and Steel. This library uses Steel.ST.C.Model.* and Steel.C.Model.* but nothing else from Steel.C.*

Side problem: how to mitigate slow verification of the library? Towards splitting it away, what about extraction?

The .fst for the new library takes 30 minutes to verify, so I moved them to examples/steel/arraystructs/fstlib instead of ulib/experimental. In practice, only the .fsti interfaces are needed by users to write SteelC programs and extract them to C.

If this is still not enough (e.g. if we consider that the verification time of Steel.ST.C.Model.* is still too high), then we might envision pulling SteelC outside of the F* repository. But then, how to support extraction to C? We may not want to keep extraction bits in FStar.Extraction.Krml referring to libraries outside of the F* repository. To solve this issue, we propose in this PR a pedestrian solution to make FStar.Extraction.Krml "extensible":

  1. I replaced each function in FStar.Extraction.Krml with an updatable reference, initialized with the default ones. We compile F* as such.
  2. For SteelC, I define a separate F* module, typechecking against the F sources* with friend FStar.Extraction.Krml, extending the extraction functions by registering extensions into those function references.
  3. Then, I compile a new fstar.exe using that additional module, statically linking with fstar.lib (made possible thanks to #2815 )

At this point I am not happy with Step 3 above, which I believe should be replaced with some form of dynamic linking of a plugin (just like with F* tactics) This may require a cleaner interface for FStar.Extraction.Krml to avoid the use of friend.

tahina-pro avatar Mar 21 '23 23:03 tahina-pro

Hi everyone. Steel moved to its own repository, FStarLang/steel, where SteelC is now in the main branch: the SteelC data types are now in https://github.com/FStarLang/steel/tree/main/lib/steel/c , with the proofs in https://github.com/FStarLang/steel/tree/main/src/proofs/steelc . For reference, I saved the latest state of this PR into https://github.com/tahina-pro/FStar/tree/_john_ml_steel_c Thank you again @john-ml for your work! It is now in the main branch of FStarLang/steel as described above.

tahina-pro avatar May 03 '23 17:05 tahina-pro