sexpr
sexpr copied to clipboard
An encoding, decoding & utility library for S-expressions in Rust
-
sexpr: A S-expression library for Rust sexpr strives to be the canonical library for reading, manipulating and writing S-expressions in Rust.
The parser is fully featured, performant and can be configured to read almost any s-expression variant including "/Standard/", "/Advanced/" and "/Canonical/" formats.
Predefined compatibility configurations exist for a dozen standards and protocols including:
- Standard
- RFC 2693-compatible SPKI
- SMTLIB & SMTLIBv2
- GPG / libgcrypt
- KiCad
- R5RS-compatible abstract-syntax
Individual format options can be enabled or disabled, allowing you to parse virtually any variant.
* Overview
S-expressions are data structures for representing complex data. They are
either primitives ("atoms") or lists of simpler S-expressions. Here is a
sample S-expression:
~(snicker "abc" (#03# |YWJj|))~
It is a list of length three:
- the octet-string "=snicker="
- the octet-string "=abc="
- a sub-list containing two elements:
- the hexadecimal constant =#03#=
- the base-64 constant =|YWJj|= (which is the same as "=abc=")
** Should I use S-expressions as my serialization format?
Despite rapidly shifting technological landscapes and even faster changing
attitudes about 'proper' programming. S-expressions, and their many variants,
remain ([[http://www-formal.stanford.edu/jmc/recursive/recursive.html][as one of the oldest general encoding formats still in use today]]).
In spite of numerous challengers like JSON and XML, S-expressions retain the
advantages laid out by early computing and internetworking pioneers:
- Generality :: S-expressions are good at representing arbitrary data.
- Readability :: it is easy for someone to examine and understand the structure of an S-expression.
- Economy :: S-expressions represent data compactly.
- Tranportability :: S-expressions are easy to transport over communication media (such as email) with unusual encoding rules.
- Flexibility :: S-expressions make it relatively simple to modify and extend data structures.
- Canonicalization :: They produce a unique "canonical" form of an S-expression, for digital signature purposes.
- Efficiency :: S-expressions should admit in-memory representations that allow efficient processing.
** Configuration sexpr accepts numerous configuration
*** Predefined Configurations | Name | ~square_brackets~ | ~semi_comment~ | ~colon_keywords~ | ~hex_escapes~ | ~pipe_action~ | Notes | |-------------+-------------------+----------------+------------------+---------------+----------------+---------------------------------------------------------------------| | =STANDARD= | ✓ | ✗ | ✓ | ✓ | Base64Interior | A generic 'standard' s-expression | | =SMTLIB= | ✗ | ✓ | ✓ | ✗ | QuoteInterior | A common interchange format for SAT and SMT solvers | | =KICAD= | ✓ | ✓ | ✓ | ✓ | None | A computer-aided design program | | =GUILE= | ✓ | ✓ | ✓ | ✓ | None | A scheme intended for embedding in existing C programs | | =CANONICAL= | ✗ | ✗ | ✗ | ✗ | None | A common, interchangable encoding for many cryptographic protocols. |
*** Configuration Variables
**** =semi_comment= Line comments can be enabled when parsing s-expressions by setting ~semi_comment = Some(&["#", ";"])~.
This ignores the rest of the stream until encountering a newline or EOF,
this does *not* comment out interior s-expressions like proposals like [[http://srfi.schemers.org/srfi-62/srfi-62.html][SRFI
62]].
***** =colon_keywords= Many Scheme implementations assign a special meaning to atoms beginning with =#:= or =:=, sexpr can parse these as 'keywords' or they can be treated as valid starting characters to an ordinary symbol. =(item :keyword value :keyword2 value)=
You can control this behaviour with ~ParseConfig.allow_keywords = Some(&["#", "#:"])~
***** =square_brackets= Some Lisp implementations and s-expressions allow square brackets (=[= and =]=) to be used as an alternative bracket character.
These brackets must still themselves remain matched.
***** =radix_escape= ****** ~#~ Libgcrypt, GPG and numerous Scheme implementations allow you to enclose a hexadecimal string in =#= characters: ~(q #61626364656667#)~
You can enable this with =hex_escapes=
****** ~#b~ and ~#x~ In a similar fashion, =#b= and =#x= are used to specify binary and hexadecimal encodings of the number that follows the trailing letter.
~((two-hundred-fifty-five #xff)~ would be encoded as =List(Symbol(two-hundred-fifty-five), U64(255))=
Similarly, ~(sixteen #b10000))~ would be encoded as =List(Symbol(sixteen), U64(16))=
You can control if both of these are accepted with the ~radix_escape~ option.
****** Both When both of these options are enabled in tandem, sexpr will use the following character to determine the variety of radix specification.
****** Neither If ~radix_escape~ is false, the initial ~#~ character will be treated as an atom.
***** =parse_pipe_behavior= Standard decoding treats the | character as a valid starting literal to any Atom, although two other options are permitted:
******* /Advanced/-style Rivest-style 'advanced' encodings dictate a string between two =|= characters be decoded as a stream of u8 (octets) in Base64.
Use ~ParseConfig.pipe_action = ParsePipeBehavior::Base64Interior~
******* SMTLIBv2 SMT and SAT solvers using this format use the =|= character to quote it's interior, preserving line breaks and other whitespace in a Symbol.
Use ~ParseConfig.pipe_action = ParsePipeBehavior::QuoteInterior~
***** =transport= Today, sexpr supports the most common form of S-expression transport encoding, [[https://tools.ietf.org/html/rfc4648][RFC 4648 Base64]]. To indicate that you'd like to encode or decode an S-expression as Base64, you can modify your configuration as following.
#+BEGIN_SRC rust
let mut config = STANDARD.copy()
mut.transport = TransportEncoding::Base64
#+END_SRC
If you'd like to add a new transport field, simple add to the
TransportEncoding enum, and create a new trait that implements
=SexpTransport=, the rest is handled for you.
*** Encoding In a 2012 Dr. Dobb's retrospective, Karl Eiger noted that S-expressions are have been in continuous use longer than any other formats that remain in widespread use today.
Despite this long history, there is no canonical way to encode a variety of
different abstract data structures.
**** Sequences
#+BEGIN_SRC rust
let vec: Vec
#+BEGIN_SRC rust
let hs: HashSet<i32> = vec!(1, 2, 3).into_iter().collect();
sexpr::encode(&hs)
#+END_SRC
Result:
: (1 2 3)
**** Hash Tables #+BEGIN_SRC rust let ht = HashMap::new(); ht.insert('a', 1); ht.insert('b', 2); ht.insert('c', 3); sexpr::encode(&ht); #+END_SRC Result: : ((a . 1) (b . 2) (c . 3))
**** Tuple ***** Struct #+BEGIN_SRC rust struct TupleStruct(i32, i32, i32); let ts = TupleStruct(1, 2, 3); sexpr::encode(&ts); #+END_SRC Result: : ((_field0 1) (_field1 2) (_field2 3))
**** Struct ***** Ordinary #+BEGIN_SRC rust struct Color { r: u8, g: u8, b: u8, } sexpr::encode(&Color {r: 1, g: 2, b: 3}); sexpr::encode(&Color {r: 1, g: 2, b: 3}, (true)); #+END_SRC Result: : ((variant Color) ((r 1) (g 2) (b 3))) : ((r 1) (g 2) (b 3))
***** Tuple Struct #+BEGIN_SRC rust struct Kangaroo(u32, String); sexpr::encode(&Kangaroo(34, &"William"); #+END_SRC Result: #+RESULTS: : (34 "William")
***** Newtype #+BEGIN_SRC rust struct Inches(u64) sexpr::encode(&Inches(128)); #+END_SRC Result : 128
***** Unit #+BEGIN_SRC rust struct Instance #+END_SRC Result: : nil
**** Enum #+BEGIN_SRC rust enum E { W { a: i32, b: i32 }, X(i32, i32), Y(i32), Z, }
E::W { a: 0, b: 0 };
E::X(0, 0);
E::Y(0);
E::Z;
#+END_SRC
Result:
: ((variant W) ((a 0) (b 0)))
: ((variant X) (0 0))
: ((variant Y) 0)
: ((variant Z))