How to handle partial/lazy deserialization of sequences?
So- for some quick background, I'm implementing ActivityPub deserialization using serde. ActivityPub uses JSON-LD, which is... a rather... interesting format (due to being a JSON representation of a linked data graph), where the following is valid:
{} // no foo value
{ "foo": null } // not valid according to spec, but some implementations emit this as no foo value
{ "foo": 123 } // a single foo value
{ "foo": [123] } // not valid according to spec, but some implementations emit this as a single foo value
{ "foo": [123, 456] } // multiple foo values
For simplicity, an implementation that only knows how to handle a single foo value would want to parse the above as None for the first case and Some(123) for the others. To do this, I've been implementing a with handlers that use SeqTransformerVisitors- a generic visitor which takes a generic value implementing the following trait:
trait SeqTransformer<'de, T> {
type Output;
fn expecting(&self, formatter: &mut Formatter) -> fmt::Result;
fn transform_none<E>(self) -> Result<Self::Output, E>
where
E: Error;
fn transform_some<E>(self, value: T) -> Result<Self::Output, E>
where
E: Error;
fn transform_seq<A>(self, seq: A) -> Result<Self::Output, A::Error>
where
A: SeqAccess<'de>;
}
The implementation for SeqTransformerVisitor is fairly simple- visit_none, visit_unit, visit_some, and visit_seq pass values to the SeqTransformer, while all other visitor methods are implemented by passing the value down to one of four deserializers, which then deserialize T and pass the value to the SeqTransformer:
-
Value: Similar toserde_core::private::Content, but without any heap values-Some,NewType,Seq, andMapvariants are not included. -
EnumDeserializer<E>: A generic deserializer that passes everything todeserialize_enum. -
MapDeserializer<M>: A generic deserializer that passes everything todeserialize_map. -
SeqDeserializer<S>: A generic deserializer that passes everything todeserialize_seq.
From here, we're able to define SeqTransformers that handle these odd maybe-array types in a variety of ways, such as the following:
pub fn deserialize<'de, D, T>(deserializer: D) -> Result<Option<T>, D::Error>
where
D: Deserializer<'de>,
T: Deserialize<'de>,
{
struct FirstTransformer<'de, T> {
_phantom: PhantomData<fn(&'de ()) -> Option<T>>,
}
impl<'de, T> SeqTransformer<'de, T> for FirstTransformer<'de, T>
where
T: Deserialize<'de>,
{
type Output = Option<T>;
#[inline]
fn expecting(&self, formatter: &mut Formatter) -> fmt::Result {
write!(formatter, "an array or ")
}
#[inline]
fn transform_none<E>(self) -> Result<Self::Output, E>
where
E: Error,
{ Ok(None) }
#[inline]
fn transform_some<E>(self, value: T) -> Result<Self::Output, E>
where
E: Error,
{ Ok(Some(value)) }
#[inline]
fn transform_seq<A>(self, mut seq: A) -> Result<Self::Output, A::Error>
where
A: SeqAccess<'de>,
{ Ok(seq.next_element::<T>()?) }
}
let visitor = SeqTransformVisitor::new(FirstTransformer::<'_, T> {
_phantom: PhantomData,
});
deserializer.deserialize_any(visitor)
}
This almost works, we're able to deserialize all but the last payload with serde_json, which fails with the following: Error("invalid length 2, expected fewer elements in array", line: 0, column: 0). Notably, this works in simd-json, but I'm hesitant to call that the "correct" behavior,
From here, I'm kinda stumped. The only thing I can think of would be to add a SeqAccessExt trait that defines a method which consumes all the remaining elements with a type that calls deserialize_ignored_any with a visitor that returns a unit value, but that feels inefficient. I tried to think if it might be feasible to do something with a Deserializer supertrait and RawValues/simd_json::value::lazy::Values, but I think you'd wind up back at this same problem.
What would your thoughts on the best way to go about this be?
Sidenote: Apologies for the long issue description, I wanted to make sure that I captured the full breadth of the question.