everparse
everparse copied to clipboard
EverParse+Steel: Verified parsing and serialization with separation logic
Motivation
Coming up with a domain-specific language for formally verified parsing and serialization at the right level of abstraction with runtime efficiency in mind is much larger a research project than one may think, and we at Project Everest have been working on that for a very long time. Several challenges:
- minimize memory copies for speed ==> things must be serialized in the right order
- ease of user proof: how much layout details do we want to expose to the user
- inplace mutations and their impact on dependently parsed formats (e.g. tagged unions)
- must be extracted to (ideally auditable) C code
Our USENIX Security 2019 paper, particularly Sections 4.4 and 4.5, presents our first attempt in Low*, as done in miTLS, based on LowParse.Low.*
validators and accessors. As described there, EverParse already produces such validators, accessors and serializer primitives for the user to write a Low* program reading and writing valid packets, and we extensively use them in miTLS and EverQuic, but layout details (e.g. offsets to the input/output buffers) are still very much exposed to the user.
This PR: Steel+EverParse
In this PR, I propose to use Steel, a separation logic framework for F*, to model resources for byte arrays containing byte representations valid with respect to a given parser specification: LowParse.Steel.*
If a
is a byte array and p
is a parser specification, then LowParse.Steel.VParse.vparse a p
says that a
exactly contains bytes valid with respect to p
: p
consumes all bytes of a
and succeeds. Such a model seems to work with most parsers currently supported by LowParse, which either consume all their input or have the strong prefix property. The selector (ghostly) returns the high-level value resulting from parsing.
Then, validating an input buffer x
against a parser p
should return either a null pointer (if parsing fails), or a pointer to the remaining unparsed part of the input buffer, truncating x
to the part actually consumed by the parser/combinator, thus establishing vparse p x
as a post-condition. See LowParse.Steel.Validate
.
By using Steel, the main goal is to hide most, if not all, offset reasoning away from the user.
Utilities
Byte arrays
This PR is based on FStarLang/FStar#2319, which models C pointers and provides an abstraction for byte arrays.
Based on Steel.Array
, which models mutable C arrays, I am introducing LowParse.Steel.ArrayPtr
, adopting the interface I first proposed on FStarLang/FStar#2309, but with an implementation now based on a more faithful model of C pointers:
-
Spatial split and join: if
x
is an array viewed as a "pointer", then, instead of returning a pair of arrays,split x i
will return one "pointer" y to represent the elements from indexi
and at the same time "cut"x
to have it represent only the elements up to indexi-1
. Similarly,merge x y
(if adjacent) will "restore"x
by extending it to cover all elements iny
. These are made possible by storing the actual array in a ghost reference while retaining its base pointer as the concrete value. -
Similarly, the temporal fractional permission also becomes a ghost reference, in such a way that
share x
returns a new "pointer" with half permission, with the permission on x also being cut by half at the same time.
Right-to-left output buffer
Based on Tao et al.'s USENIX 2021 paper, which introduced right-to-left low-level serialization with Low* and LowParse for a subset of the X.509 certificate format for IoT, I introduce a Steel data structure, LowParse.Steel.R2LOutput
, to represent right-to-left output buffer as a pair of a byte array and its length, with operations to "allocate" chunks of bytes starting from the right end of that array.