everparse icon indicating copy to clipboard operation
everparse copied to clipboard

EverParse+Steel: Verified parsing and serialization with separation logic

Open tahina-pro opened this issue 3 years ago • 0 comments

Motivation

Coming up with a domain-specific language for formally verified parsing and serialization at the right level of abstraction with runtime efficiency in mind is much larger a research project than one may think, and we at Project Everest have been working on that for a very long time. Several challenges:

  • minimize memory copies for speed ==> things must be serialized in the right order
  • ease of user proof: how much layout details do we want to expose to the user
  • inplace mutations and their impact on dependently parsed formats (e.g. tagged unions)
  • must be extracted to (ideally auditable) C code

Our USENIX Security 2019 paper, particularly Sections 4.4 and 4.5, presents our first attempt in Low*, as done in miTLS, based on LowParse.Low.* validators and accessors. As described there, EverParse already produces such validators, accessors and serializer primitives for the user to write a Low* program reading and writing valid packets, and we extensively use them in miTLS and EverQuic, but layout details (e.g. offsets to the input/output buffers) are still very much exposed to the user.

This PR: Steel+EverParse

In this PR, I propose to use Steel, a separation logic framework for F*, to model resources for byte arrays containing byte representations valid with respect to a given parser specification: LowParse.Steel.*

If a is a byte array and p is a parser specification, then LowParse.Steel.VParse.vparse a p says that a exactly contains bytes valid with respect to p: p consumes all bytes of a and succeeds. Such a model seems to work with most parsers currently supported by LowParse, which either consume all their input or have the strong prefix property. The selector (ghostly) returns the high-level value resulting from parsing.

Then, validating an input buffer x against a parser p should return either a null pointer (if parsing fails), or a pointer to the remaining unparsed part of the input buffer, truncating x to the part actually consumed by the parser/combinator, thus establishing vparse p x as a post-condition. See LowParse.Steel.Validate.

By using Steel, the main goal is to hide most, if not all, offset reasoning away from the user.

Utilities

Byte arrays

This PR is based on FStarLang/FStar#2319, which models C pointers and provides an abstraction for byte arrays.

Based on Steel.Array, which models mutable C arrays, I am introducing LowParse.Steel.ArrayPtr, adopting the interface I first proposed on FStarLang/FStar#2309, but with an implementation now based on a more faithful model of C pointers:

  • Spatial split and join: if x is an array viewed as a "pointer", then, instead of returning a pair of arrays, split x i will return one "pointer" y to represent the elements from index i and at the same time "cut" x to have it represent only the elements up to index i-1. Similarly, merge x y (if adjacent) will "restore" x by extending it to cover all elements in y. These are made possible by storing the actual array in a ghost reference while retaining its base pointer as the concrete value.

  • Similarly, the temporal fractional permission also becomes a ghost reference, in such a way that share x returns a new "pointer" with half permission, with the permission on x also being cut by half at the same time.

Right-to-left output buffer

Based on Tao et al.'s USENIX 2021 paper, which introduced right-to-left low-level serialization with Low* and LowParse for a subset of the X.509 certificate format for IoT, I introduce a Steel data structure, LowParse.Steel.R2LOutput, to represent right-to-left output buffer as a pair of a byte array and its length, with operations to "allocate" chunks of bytes starting from the right end of that array.

tahina-pro avatar Jun 16 '21 03:06 tahina-pro