radicle-link icon indicating copy to clipboard operation
radicle-link copied to clipboard

Move away from serde for Canonical JSON

Open kim opened this issue 3 years ago • 2 comments

serde requires canonicalisation to happen when domain types are already erased. This creates several footguns:

  • String types may not compare equal after a roundtrip
  • Set types do not have a counterpart in the serde datamodel: they come out the same as arrays (i.e. insertion-ordered lists). What we want is to always have ordered-set semantics.
  • Floating point numbers (which are illegal in Canonical JSON) cause a runtime error

To remedy this, we should precisely constrain what types can be used to compose a structure subject to canonicalisation.

The proposal is to always go through an intermediate representation (akin to enum Value), which is already guaranteed to hold the canonical form by construction. Type-directed conversion for std types may be provided. Literals may be supported via proc-macros.

Reasoning:

  • It is no less memory-efficient than what we have now, as we need to buffer map-shaped objects before canonicalisation
  • It is less surprising (e.g. plain String or &str would just not be representable)
  • The actual encoding / decoding could be made much more efficient in terms of code size, compilation cost, dependencies, and possibly even runtime performance.

kim avatar Dec 02 '20 14:12 kim

Would this essentially look like:

use crate::Cstring;
use std::collections::{BTreeMap, BTreeSet};

pub enum Value {
    Object(BTreeMap<Cstring, Value>),
    Array(BTreeSet<Value>),
    String(Cstring),
    Number(Number),
    Bool(bool),
    Null,
}

pub enum Number {
    U64(u64),
    I64(i64),
}

impl Canonical for Value {
/* left as an exercise to the implementor ^_^ */
}

Or is it more nuanced than that?

FintanH avatar Sep 23 '21 13:09 FintanH

We’d probably want to provide derive macros for user-defined extension payloads.

kim avatar Sep 23 '21 14:09 kim