hugr icon indicating copy to clipboard operation
hugr copied to clipboard

Serde and Hugr serialisation compatibility

Open lmondada opened this issue 7 months ago • 3 comments

Now that we are moving to Hugr envelopes, support for serde is provided through serde_with's serde_as macro, as in

#[serde_as]
#[derive(Deserialize, Serialize)]
struct A {
    #[serde_as(as = "CustomAsEnvelope")]
    package: Package,
}

Unfortunately, this has some drawbacks, which made me wonder what the best approach might be. Below are the three options I see.

Note: My assumption throughout this discussion is that we want to maintain compatibility (or at least a bridge) between Hugr serialisation and serde. This is by far the simplest way to provide serialisation/deserialisation for most standard types, which is useful for testing, for serialising and compiling the "rewriters" in tket2/badger etc. For types that contain Hugrs (possibly among other data), such as SimpleReplacement and PersistentHugr, using serde requires our hugr serialisation and serde's serialisation capabilities to be compatible in some form.

Option 1: Use serde_with's SerialiseAs and DeserialiseAs traits (current option)

Main advantage is that it's pretty and straightforward in user code (see example above). However, I see the following issues

  • serialisation/deserialisation can only be provided on types with Hugrs defined over a frozen extensions set (frozen at struct definition time!). That might cover tket2's use cases, but is problematic for e.g. SimpleReplacement that should work over any extension set.
  • the CustomAsEnvelope types passed to the proc-macro must be defined in user code as they are specific to the set of extensions supported. From hugr's perspective, they are foreign types that must implement the foreign traits of serde_with. Thus we cannot provide blanket implementations for these (orphan rule), and have to fall back to providing a macro that the user invokes (currently called impl_serde_as_string_envelope). The serialisation logic for every type in the hugr crate must be added in that macro, and must be written manually (no serde). Furthermore, this means we cannot implement serialisation on non-public types and serialisation may only use the public API. In my case, I would have to make a lot of implementation details public for this to work.

Option 2: Implement serde::Serialize and serde::DeserializeSeed

The problem we are trying to solve is to provide a "context" (the extensions set) for deserialisation. This is something serde itself supports in its DeserializeSeed trait. DeserializeSeed is like Deserialize, except that it is implemented on a "seed" type rather than the value to be deserialised. The seed is passed to and consumed by the fn deserialize method.

pub trait DeserializeSeed<'de>: Sized {
    type Value;

    // Required method
    fn deserialize<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
       where D: Deserializer<'de>;
}

There is no equivalent of serde_json::from_str for seeded deserialisation, but you can always use serde_json::Deserializer or any other deserializer.

This seems to be the crab-approved way of doing things and it seems to work well. Minor concerns

  • We still cannot use serde's derive for deserialisation, but at least it works for serialisation
  • the DeserializeSeed trait must be implemented on a different struct for each type we want to offer deserialisation. I've solved this right now with the following type, not the most ergonomic
pub struct ExtensionsSeed<'a, V> {
    pub extensions: &'a ExtensionRegistry,
    _marker: std::marker::PhantomData<V>,
}
  • Hugr would implement serde::Serialize. Is this an issue? We could always have a wrapper type

Option 3: Create our own SerializeWithExtensions trait

This would be quite close to serde's DeserializeSeed trait, but we could customise it a bit to our scenario. The API would be slightly clearer to someone that is not familiar with serde's DeserializeSeed trait (and slightly less clear to someone that does know it). The main problems I see here

  • fully manual serialise and deserialise implementations
  • we would want to support serde's format Deserializers (serde_json etc). For this we would need to implement the DeserializeSeed trait anyways. This is because the Deserializer uses a visitor pattern that expects these serde traits.

The second point makes this option a superset of Option 2.

Summary

All three options will be quite verbose, as serde's derive macros can't be used (at least for Deserialize). The options aren't mutually exclusive either. Option 1 is the most limiting, and I'd argue does not cover all usecases we (I) need. In the immediate term, option 2 seems to be the simplest solution to offering serialisation for SimpleReplacement and PersistentHugr.

In the longer term, the verbosity of either of these approaches could be resolved with custom proc-macros, e.g. forked from serde_derive. There's nothing stopping this in principle.

lmondada avatar May 26 '25 06:05 lmondada

Update: there is a LOT of boilerplate code that needs to be written for serialisation... I don't see a way around that other than with custom proc macros.

See https://github.com/CQCL/hugr/pull/2258

lmondada avatar May 27 '25 06:05 lmondada

We have landed on the following solution (thanks @zrho for the suggestion)

  • for every composed Hugr type in the hugr crate (e.g. SimpleReplacement, PersistentHugr, create an equivalent serial type, parametrised on the wrapped Hugr type, e.g. SerialSimpleReplacement<WrappedHugr> . This implements Serialize and Deserialize , given a serializable wrapped type.
  • Assuming From/Into between the Hugr and the wrapped types, hugr can also define to_serialized and from_serialized for these types
  • the user (e.g. tket2), defines the wrapped Hugr type (e.g. HugrWithTK2Extensions) and then for each composed struct, use a `#[serde(with = "module")] attribute in which the Hugr type is converted to the equivalent serial type with the wrapped type, and then serialized.

In the future, it would be great if there was a SerialHugr type, which can be deserialized without knowing the extension set. Given such a "SerialHugr", the extensions can then get "linked in" to convert to a proper Hugr (presumably this linking would be a trait that SimpleReplacement and other could also provide recursively). At that point, users would not have to define their custom Hugr wraps for every set of extensions they want to use.

lmondada avatar May 29 '25 10:05 lmondada

See #2267

lmondada avatar May 29 '25 10:05 lmondada