serde icon indicating copy to clipboard operation
serde copied to clipboard

How to handle partial/lazy deserialization of sequences?

Open juliajohannesen opened this issue 5 months ago • 0 comments

So- for some quick background, I'm implementing ActivityPub deserialization using serde. ActivityPub uses JSON-LD, which is... a rather... interesting format (due to being a JSON representation of a linked data graph), where the following is valid:

{} // no foo value
{ "foo": null } // not valid according to spec, but some implementations emit this as no foo value
{ "foo": 123 } // a single foo value
{ "foo": [123] } // not valid according to spec, but some implementations emit this as a single foo value
{ "foo": [123, 456] } // multiple foo values

For simplicity, an implementation that only knows how to handle a single foo value would want to parse the above as None for the first case and Some(123) for the others. To do this, I've been implementing a with handlers that use SeqTransformerVisitors- a generic visitor which takes a generic value implementing the following trait:

trait SeqTransformer<'de, T> {
    type Output;

    fn expecting(&self, formatter: &mut Formatter) -> fmt::Result;
    
    fn transform_none<E>(self) -> Result<Self::Output, E>
    where
        E: Error;

    fn transform_some<E>(self, value: T) -> Result<Self::Output, E>
    where
        E: Error;

    fn transform_seq<A>(self, seq: A) -> Result<Self::Output, A::Error>
    where
        A: SeqAccess<'de>;
}

The implementation for SeqTransformerVisitor is fairly simple- visit_none, visit_unit, visit_some, and visit_seq pass values to the SeqTransformer, while all other visitor methods are implemented by passing the value down to one of four deserializers, which then deserialize T and pass the value to the SeqTransformer:

  • Value: Similar to serde_core::private::Content, but without any heap values- Some, NewType, Seq, and Map variants are not included.
  • EnumDeserializer<E>: A generic deserializer that passes everything to deserialize_enum.
  • MapDeserializer<M>: A generic deserializer that passes everything to deserialize_map.
  • SeqDeserializer<S>: A generic deserializer that passes everything to deserialize_seq.

From here, we're able to define SeqTransformers that handle these odd maybe-array types in a variety of ways, such as the following:

pub fn deserialize<'de, D, T>(deserializer: D) -> Result<Option<T>, D::Error>
where
    D: Deserializer<'de>,
    T: Deserialize<'de>,
{
    struct FirstTransformer<'de, T> {
        _phantom: PhantomData<fn(&'de ()) -> Option<T>>,
    }

    impl<'de, T> SeqTransformer<'de, T> for FirstTransformer<'de, T>
    where
        T: Deserialize<'de>,
    {
        type Output = Option<T>;

        #[inline]
        fn expecting(&self, formatter: &mut Formatter) -> fmt::Result {
            write!(formatter, "an array or ")
        }

        #[inline]
        fn transform_none<E>(self) -> Result<Self::Output, E>
        where
            E: Error,
        { Ok(None) }

        #[inline]
        fn transform_some<E>(self, value: T) -> Result<Self::Output, E>
        where
            E: Error,
        { Ok(Some(value)) }

        #[inline]
        fn transform_seq<A>(self, mut seq: A) -> Result<Self::Output, A::Error>
        where
            A: SeqAccess<'de>,
        { Ok(seq.next_element::<T>()?) }
    }

    let visitor = SeqTransformVisitor::new(FirstTransformer::<'_, T> {
        _phantom: PhantomData,
    });

    deserializer.deserialize_any(visitor)
}

This almost works, we're able to deserialize all but the last payload with serde_json, which fails with the following: Error("invalid length 2, expected fewer elements in array", line: 0, column: 0). Notably, this works in simd-json, but I'm hesitant to call that the "correct" behavior,

From here, I'm kinda stumped. The only thing I can think of would be to add a SeqAccessExt trait that defines a method which consumes all the remaining elements with a type that calls deserialize_ignored_any with a visitor that returns a unit value, but that feels inefficient. I tried to think if it might be feasible to do something with a Deserializer supertrait and RawValues/simd_json::value::lazy::Values, but I think you'd wind up back at this same problem.

What would your thoughts on the best way to go about this be?

Sidenote: Apologies for the long issue description, I wanted to make sure that I captured the full breadth of the question.

juliajohannesen avatar Oct 01 '25 01:10 juliajohannesen