sexpr icon indicating copy to clipboard operation
sexpr copied to clipboard

An encoding, decoding & utility library for S-expressions in Rust

  • sexpr: A S-expression library for Rust sexpr strives to be the canonical library for reading, manipulating and writing S-expressions in Rust.

    The parser is fully featured, performant and can be configured to read almost any s-expression variant including "/Standard/", "/Advanced/" and "/Canonical/" formats.

    Predefined compatibility configurations exist for a dozen standards and protocols including:

    • Standard
    • RFC 2693-compatible SPKI
    • SMTLIB & SMTLIBv2
    • GPG / libgcrypt
    • KiCad
    • R5RS-compatible abstract-syntax

    Individual format options can be enabled or disabled, allowing you to parse virtually any variant.

* Overview

S-expressions are data structures for representing complex data. They are

either primitives ("atoms") or lists of simpler S-expressions. Here is a

sample S-expression:

~(snicker "abc" (#03# |YWJj|))~

It is a list of length three:

- the octet-string "=snicker="

- the octet-string "=abc="

- a sub-list containing two elements:

- the hexadecimal constant =#03#=

- the base-64 constant =|YWJj|= (which is the same as "=abc=")

** Should I use S-expressions as my serialization format?

Despite rapidly shifting technological landscapes and even faster changing

attitudes about 'proper' programming. S-expressions, and their many variants,

remain ([[http://www-formal.stanford.edu/jmc/recursive/recursive.html][as one of the oldest general encoding formats still in use today]]).

In spite of numerous challengers like JSON and XML, S-expressions retain the

advantages laid out by early computing and internetworking pioneers:

- Generality :: S-expressions are good at representing arbitrary data.

- Readability :: it is easy for someone to examine and understand the structure of an S-expression.

- Economy :: S-expressions represent data compactly.

- Tranportability :: S-expressions are easy to transport over communication media (such as email) with unusual encoding rules.

- Flexibility :: S-expressions make it relatively simple to modify and extend data structures.

- Canonicalization :: They produce a unique "canonical" form of an S-expression, for digital signature purposes.

- Efficiency :: S-expressions should admit in-memory representations that allow efficient processing.

** Configuration sexpr accepts numerous configuration

*** Predefined Configurations | Name | ~square_brackets~ | ~semi_comment~ | ~colon_keywords~ | ~hex_escapes~ | ~pipe_action~ | Notes | |-------------+-------------------+----------------+------------------+---------------+----------------+---------------------------------------------------------------------| | =STANDARD= | ✓ | ✗ | ✓ | ✓ | Base64Interior | A generic 'standard' s-expression | | =SMTLIB= | ✗ | ✓ | ✓ | ✗ | QuoteInterior | A common interchange format for SAT and SMT solvers | | =KICAD= | ✓ | ✓ | ✓ | ✓ | None | A computer-aided design program | | =GUILE= | ✓ | ✓ | ✓ | ✓ | None | A scheme intended for embedding in existing C programs | | =CANONICAL= | ✗ | ✗ | ✗ | ✗ | None | A common, interchangable encoding for many cryptographic protocols. |

*** Configuration Variables

**** =semi_comment= Line comments can be enabled when parsing s-expressions by setting ~semi_comment = Some(&["#", ";"])~.

 This ignores the rest of the stream until encountering a newline or EOF,
 this does *not* comment out interior s-expressions like proposals like [[http://srfi.schemers.org/srfi-62/srfi-62.html][SRFI
 62]].

***** =colon_keywords= Many Scheme implementations assign a special meaning to atoms beginning with =#:= or =:=, sexpr can parse these as 'keywords' or they can be treated as valid starting characters to an ordinary symbol. =(item :keyword value :keyword2 value)=

  You can control this behaviour with ~ParseConfig.allow_keywords = Some(&["#", "#:"])~

***** =square_brackets= Some Lisp implementations and s-expressions allow square brackets (=[= and =]=) to be used as an alternative bracket character.

  These brackets must still themselves remain matched.

***** =radix_escape= ****** ~#~ Libgcrypt, GPG and numerous Scheme implementations allow you to enclose a hexadecimal string in =#= characters: ~(q #61626364656667#)~

   You can enable this with =hex_escapes=

****** ~#b~ and ~#x~ In a similar fashion, =#b= and =#x= are used to specify binary and hexadecimal encodings of the number that follows the trailing letter.

   ~((two-hundred-fifty-five #xff)~ would be encoded as =List(Symbol(two-hundred-fifty-five), U64(255))=
   Similarly, ~(sixteen #b10000))~ would be encoded as =List(Symbol(sixteen), U64(16))=

   You can control if both of these are accepted with the ~radix_escape~ option.

****** Both When both of these options are enabled in tandem, sexpr will use the following character to determine the variety of radix specification.

****** Neither If ~radix_escape~ is false, the initial ~#~ character will be treated as an atom.

***** =parse_pipe_behavior= Standard decoding treats the | character as a valid starting literal to any Atom, although two other options are permitted:

******* /Advanced/-style Rivest-style 'advanced' encodings dictate a string between two =|= characters be decoded as a stream of u8 (octets) in Base64.

    Use ~ParseConfig.pipe_action = ParsePipeBehavior::Base64Interior~

******* SMTLIBv2 SMT and SAT solvers using this format use the =|= character to quote it's interior, preserving line breaks and other whitespace in a Symbol.

    Use ~ParseConfig.pipe_action = ParsePipeBehavior::QuoteInterior~

***** =transport= Today, sexpr supports the most common form of S-expression transport encoding, [[https://tools.ietf.org/html/rfc4648][RFC 4648 Base64]]. To indicate that you'd like to encode or decode an S-expression as Base64, you can modify your configuration as following.

  #+BEGIN_SRC rust
  let mut config = STANDARD.copy()
  mut.transport = TransportEncoding::Base64
  #+END_SRC

  If you'd like to add a new transport field, simple add to the
  TransportEncoding enum, and create a new trait that implements
  =SexpTransport=, the rest is handled for you.

*** Encoding In a 2012 Dr. Dobb's retrospective, Karl Eiger noted that S-expressions are have been in continuous use longer than any other formats that remain in widespread use today.

Despite this long history, there is no canonical way to encode a variety of
different abstract data structures.

**** Sequences #+BEGIN_SRC rust let vec: Vec = vec![1,2,3]; sexpr::encode(&vec) #+END_SRC Result: : (1 2 3)

 #+BEGIN_SRC rust
 let hs: HashSet<i32> = vec!(1, 2, 3).into_iter().collect();
 sexpr::encode(&hs)
 #+END_SRC
 Result:
 : (1 2 3)

**** Hash Tables #+BEGIN_SRC rust let ht = HashMap::new(); ht.insert('a', 1); ht.insert('b', 2); ht.insert('c', 3); sexpr::encode(&ht); #+END_SRC Result: : ((a . 1) (b . 2) (c . 3))

**** Tuple ***** Struct #+BEGIN_SRC rust struct TupleStruct(i32, i32, i32); let ts = TupleStruct(1, 2, 3); sexpr::encode(&ts); #+END_SRC Result: : ((_field0 1) (_field1 2) (_field2 3))

**** Struct ***** Ordinary #+BEGIN_SRC rust struct Color { r: u8, g: u8, b: u8, } sexpr::encode(&Color {r: 1, g: 2, b: 3}); sexpr::encode(&Color {r: 1, g: 2, b: 3}, (true)); #+END_SRC Result: : ((variant Color) ((r 1) (g 2) (b 3))) : ((r 1) (g 2) (b 3))

***** Tuple Struct #+BEGIN_SRC rust struct Kangaroo(u32, String); sexpr::encode(&Kangaroo(34, &"William"); #+END_SRC Result: #+RESULTS: : (34 "William")

***** Newtype #+BEGIN_SRC rust struct Inches(u64) sexpr::encode(&Inches(128)); #+END_SRC Result : 128

***** Unit #+BEGIN_SRC rust struct Instance #+END_SRC Result: : nil

**** Enum #+BEGIN_SRC rust enum E { W { a: i32, b: i32 }, X(i32, i32), Y(i32), Z, }

  E::W { a: 0, b: 0 };
  E::X(0, 0);
  E::Y(0);
  E::Z;
  #+END_SRC
  Result:
  : ((variant W) ((a 0) (b 0)))
  : ((variant X) (0 0))
  : ((variant Y) 0)
  : ((variant Z))