serde icon indicating copy to clipboard operation
serde copied to clipboard

Using de/serialize_with inside of an Option, Map, Vec

Open sfackler opened this issue 8 years ago • 40 comments

These would be equivalent to Jackson's @Serialize(keysUsing=...) etc. Now that we have stateful deserialization, this can be implemented in a pretty straightforward way.

I believe we'd want to support map keys and values as well was seq and option values.

sfackler avatar Jan 26 '17 19:01 sfackler

Ideally I would like to find an approach that composes better than keysUsing.

#[derive(Deserialize)]
struct S {
    #[serde(deserialize_with = "my_key")]
    key: K,

    #[serde(deserialize_with = "my_value")]
    value: V,

    #[serde(???)]
    opt_map: Option<Map<K, V>>,
}

dtolnay avatar Apr 07 '17 06:04 dtolnay

#576 has an approach based on a helper for generating ordinary deserialize_with functions, rather than using a slate of new attributes.

dtolnay avatar Apr 08 '17 16:04 dtolnay

This could be neat:

#[derive(Deserialize)]
struct S {
    #[serde(deserialize_with = "my_key")]
    key: K,

    #[serde(deserialize_with = "my_value")]
    value: V,

    #[serde(deserialize_with = "my_opt_map")]
    opt_map: Option<Map<K, V>>,
}

fn my_map<'de, D>(deserializer: D) -> Result<Map<K, V>, D::Error>
    where D: Deserializer<'de>
{
    deserialize_map_with!(my_key, my_value)(deserializer)
}

fn my_opt_map<'de, D>(deserializer: D) -> Result<Option<Map<K, V>>, D::Error>
    where D: Deserializer<'de>
{
    deserialize_option_with!(my_map)(deserializer)
}

dtolnay avatar Apr 08 '17 17:04 dtolnay

Another possible composable approach:

#[derive(Deserialize)]
struct S {
    #[serde(deserialize_with = "my_k")]
    k: K,

    #[serde(deserialize_with = "option!(my_k)")]
    opt: Option<K>,

    #[serde(deserialize_with = "option!(map!(my_k, my_v))")]
    opt_map: Option<Map<K, V>>,

    #[serde(deserialize_with = "map!(_, my_v)")]
    map: Map<u8, V>,
}

dtolnay avatar Apr 10 '17 07:04 dtolnay

This needs to support all the wrapper types too: Rc, Arc, Cell, RefCell, Mutex, RwLock.

dtolnay avatar Apr 20 '17 17:04 dtolnay

Would a syntax like this also want to support custom key/values within a custom map deserializer?

Like

#[derive(Deserialize)]
struct S {
    #[serde(deserialize_with = "my_k")]
    k: K,

    #[serde(deserialize_with = "option!(my_k)")]
    opt: Option<K>,

    #[serde(deserialize_with = "my_map")]
    map: Map<u8, u8>,

    #[serde(deserialize_with = "option!(map!(my_k, my_v))")]
    opt_map: Option<Map<K, V>>,

    #[serde(deserialize_with = "map!(_, my_v)")]
    map: Map<u8, V>,

    #[serde(deserialize_with = "map::<my_map>!(my_k, my_v)")]
    map: Map<K, V>,
}

or would this not be feasible at all with the current Deserializer architecture?

daboross avatar May 16 '17 06:05 daboross

Yes, probably by means of a trait that is implemented for all types that support map!, a different one that is implemented for types that support option!, etc.

dtolnay avatar May 16 '17 06:05 dtolnay

hm, ok. Would that mean the my_map deserializer would have to return a type which implements some trait then?

I mean deserialize_with is often used with types from other libraries, where implementing a trait on the type isn't possible - just trying to hash out the problem here.

For a syntax alternative, what would you think of something like "inner" attribute, like this?

#[derive(Deserialize)]
struct S {
    #[serde(deserialize_with = "my_k")]
    k: K,

    #[serde(inner(K, deserialize_with = "my_k"))]
    opt: Option<K>,

    #[serde(deserialize_with = "my_map")]
    map: Map<u8, u8>,

    #[serde(inner(K, deserialize_with = "my_k"))]
    #[serde(inner(V, deserialize_with = "my_v"))]
    opt_map: Option<Map<K, V>>,

    #[serde(inner(V, deserialize_with = "my_v"))]
    map: Map<u8, V>,

    #[serde(deserialize_with = "my_map")]
    #[serde(inner(K, deserialize_with = "my_k"))]
    #[serde(inner(V, deserialize_with = "my_v"))]
    map: Map<K, V>,
}

If this kind of syntax would be allowed in attributes, and if it relatively matched what we can feasibly make happen, it would also provide an intuitive way to include #914:

#[derive(Deserialize)]
struct S {
    #[serde(inner(Cow<'a, str>, borrow))]
    cow: Vec<Cow<'a, str>>,
}

Edit: again, thinking entirely " ideal situation" here, but could we have this attribute support literally all field attributes by having the Deserialize impl create a newtype for each #[inner] clause with the inner attributes?

If I understand the situation correctly, having it create a newtype would let any inner attributes apply to the newtypos single field, and the changes in deserialization could be fully conveyed statically with no cost.

Thoughts?

Or... any obvious contradictions which I completely overlooked which would invalidate this?

daboross avatar May 16 '17 06:05 daboross

Sorry, I think I was just on the complete wrong track there. My apologies for not researching how all of this actually works and thinking about it before commenting!

I think I can agree now that a trait is probably the best way to do this.

daboross avatar May 21 '17 06:05 daboross

I'm pretty new to Rust and Serde, but since you asked (in e.g. #999 and #1005) for people's thoughts on a design, here's how I would like this to work as a user:

mod remote {   // the remote crate
  struct Foo {    // the remote struct I'm using
    ...
  }
}
////////////////
mod my {   // my crate
  #[derive(Serialize, Deserialize)]
  struct FooDef {  // a struct with identical fields to Foo
    ...
  }

  // A macro invocation that emits some declarations that make
  // it so that, within this module, _every_ use of Foo is serialized
  // as if it had been annotated with #[serde(with = "FooDef")]
  serde_serialize_as!(Foo, FooDef);

  #[derive(Serialize, Deserialize)]
  struct MyStruct {
    field1: Foo,   // works, no additional annotation necessary
    field2: Option<Foo>,   // also works
    field3: HashMap<Foo, String>   // also works
  }
}

I don't know enough about Rust macros and Serde to say whether this is actually implementable.

HighCommander4 avatar Oct 12 '17 19:10 HighCommander4

I guess no design has been decided yet? I have a struct that has a Option<toml::value::Datetime> but I'd like to store it as a String instead since I don't want to care about TOML once it's loaded.

fn from_toml_datetime<'de, D>(deserializer: D) -> StdResult<Option<String>, D::Error>
    where
        D: Deserializer<'de>,
{
    toml::value::Datetime::deserialize(deserializer)
        .map(|s| Some(s.to_string()))
}

was my attempt at #[serde(deserialize_with = "from_toml_datetime")]

The above was my intuition so I would vote for something like that, if it's doable.

Is there a way to do that at all currently?

Keats avatar Feb 23 '18 18:02 Keats

@Keats that looks like it should work correctly. Is a field marked with #[serde(default, deserialize_with = "from_toml_datetime")] not working?

Edit: just realized you included the attribute. I would recommend also adding #[serde(default)] to handle the case where it's not there - this is implied regularly, but is not when there's a custom deserialize like this.

daboross avatar Feb 23 '18 20:02 daboross

I was missing the default, that did the trick. Thanks!

Keats avatar Feb 24 '18 09:02 Keats

What does anyone think of including non-(de)serialize_with attributes in this issue?

I'm currently running into a situation where it would make sense to use a HashMap<i64, Cow<'a, [u8]>> with all inner Cow's borrowed. However, #[serde(borrow)] only affects the outermost type, just like (de)serialize_with.

daboross avatar Mar 04 '18 00:03 daboross

Is there any way to workaround it? I have a struct like:

struct Snippet {
    annotations: Vec<Annotation>
}

#[derive(Deserialize)]
#[serde(remote = "Snippet")]
struct SnippetDef {
  #[serde(with = "AnnotationDef")]
  annotations: Vec<Annotation>
}

and that of course doesn't work. How should I solve it until this is fixed?

zbraniecki avatar Apr 18 '18 18:04 zbraniecki

@zbraniecki

My understanding is that you'd work around it by making a method, roughly:

fn deserialize_annotation_vec<'de, D>(deserializer: D) -> Result<Vec<Annotation>, D::Error> {
    struct AnnotationVecVisitor;
    impl<'de> Visitor<'de> for AnnotationVecVisitor {
        type Value = Vec<Annotation>;
        fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result { write!(f, "a list of annotations") }
        fn visit_seq<A: SeqAccess<'de>>(self, seq: A) -> Result<Vec<Annotation>, A::Error> {
            let mut vec = Vec::with_capacity(cmp::min(seq.size_hint().unwrap_or(0), 4096));
            while let Some(v) = seq.next_element()? {
                // assert type
                let v = <AnnotationDef as Into>::into(v);
                vec.push(v);
            }
            Ok(vec)
        }
    }

    deserializer.deserialize_seq(AnnotationDefVisitor)
}

Then you'd use #[serde(deserialize_with = "deserialize_annotation_vec"] on the field to use this implementation.

Sources:

  • deserialize method: https://serde.rs/custom-date-format.html
  • visitor: https://serde.rs/impl-deserialize.html
  • Vec visitor impl (for body of visit_seq): https://github.com/serde-rs/serde/blob/master/serde/src/de/impls.rs#L625

daboross avatar Apr 18 '18 18:04 daboross

I would implement it as:

use serde::{Deserialize, Deserializer};

#[derive(Deserialize)]
#[serde(remote = "Snippet")]
struct SnippetDef {
    #[serde(deserialize_with = "vec_annotation")]
    annotations: Vec<Annotation>,
}

fn vec_annotation<'de, D>(deserializer: D) -> Result<Vec<Annotation>, D::Error>
where
    D: Deserializer<'de>,
{
    #[derive(Deserialize)]
    struct Wrapper(#[serde(with = "AnnotationDef")] Annotation);

    let v = Vec::deserialize(deserializer)?;
    Ok(v.into_iter().map(|Wrapper(a)| a).collect())
}

dtolnay avatar Apr 18 '18 19:04 dtolnay

Thank you! That works great!

zbraniecki avatar Apr 18 '18 19:04 zbraniecki

From upcoming Rust 1.28 release notes:

Attributes on generic parameters such as types and lifetimes are now stable. e.g. fn foo<#[lifetime_attr] 'a, #[type_attr] T: 'a>() {}

I suppose it should make possible for serde_derive to fix this issue now?

RReverser avatar Jul 31 '18 19:07 RReverser

I believe those were stabilized already in 1.27.0. I don't see how it would apply to this issue though.

dtolnay avatar Jul 31 '18 19:07 dtolnay

Hmm, you're right, I misunderstood what it does. Allowing them on generic params doesn't mean getting attributes for these params from instantiation sites in a generic one.

RReverser avatar Jul 31 '18 19:07 RReverser

Still no solution or workaround for the Option<> case??

I'm doing his because I have a lot of Vec<u8>s that are encoded as hex. So I use deserialize_with = "deserialize_hex" with a custom hex method. However, now I have Option<Vec<Vec<u8>>> and I lost it all. I made a new deserialize_hex_array which can decode Vec<Vec<u8>>, but with the option it stops working. I tried having deserialize_hex_arrayreturn the Option<Vec<Vec<u8>>>, but still it didn't seem to work. I still got "missing field".

stevenroose avatar Sep 20 '18 17:09 stevenroose

@stevenroose I know this is another workaround, but still. For missing fields like that I've found using #[serde(default)] along with the "option" deserialize_with works well.

I have a ton of fields that are #[serde(default, deserialize_with = "option_timestamp")].

daboross avatar Sep 20 '18 19:09 daboross

I invested some time yesterday to tackle this and maybe I am on something. Potentially, this is my first contribution to Rust community... I am excited !

TL;DR

I have a proof-of-concept implementation here: Gist (do not be frightened by the amount of code). The procedural macro has not been altered to support as/serialize_as/deserialize_as attributes, but this is easy part, I think.

The idea I have may enable us to write:

  • #[derive(Serialize, Deserialize)
    struct SomeTime {
        #[serde(as = "Option<chrono::DateTime<chrono::Utc>>")]
        stamp: Option<chrono::NaiveDateTime>,
    }
    

    The above example is useful when NaiveDateTime (UTC) is used internally and the conversion between ISO8061 is needed only at API side.

  • #[derive(Serialize, Deserialize)
    struct SomeTime {
        #[serde(as = "Vec<Hex>")]
        bytes: Vec<Vec<u8>>,
    }
    

    Note how type nesting plays nicely here.

  • Any container can be supported:

    #[derive(Serialize, Deserialize)
    struct SomeTime {
        #[serde(as = "HashMap<i32, Option<chrono::DateTime<chrono::Local>>>")]
        map: HashMap<i32, Option<chrono::NaiveDateTime>>,
    }
    

As you can see:

  • the syntax is easy
  • the intent is very clear.
  • the composition is great. You can nest types however you want.

Details

Often one wants to serialize a struct field using specific format. Changing a type of this field is often undesirable, because it requires more code to pack new type and unpack original type from new type. And serde's job is nothing but conversions. The sad story is serde doesn't help us in this specific conversion.

As many already found serialize_with and deserialize_with have shortcomings. They do not compose well and we quickly come into explosion of conversion functions for each variant Option<T>, Vec<Option<T>>, HashMap<K, Option<V>> and so on.

With Deserialize and Serialize traits, we do not have the ability to encode how nested types should be serialized. What I propose is to add new traits with such an ability:

/// Deserialize `T` as it would be `Self` type.
pub trait DeserializeAs<'de, T>: Sized {
    fn deserialize_as<D>(deserializer: D) -> Result<T, D::Error>
    where
        D: Deserializer<'de>;

    // TODO: deserialize_as_into
}

/// Serialize `T` as it would be `Self` type.
pub trait SerializeAs<T> {
    fn serialize_as<S>(source: &T, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer;
}

This can be implemented between any two types we want, similar to From trait. In the Gist you can find some implementations, implementations for standard collections are also included.

The good part is any crate can implement this trait for:

  • non-standard collections/containers
  • usual conversions (e.g. chrono::NaiveDateTime -> chrono::DateTime, Vec<u8> -> Hex, Vec<u8> -> Base64)

By doing this we can share more code within community.

The traits are implemented for more specific type, not the other way around. Otherwise one would not be able to implement conversion for its own type due to coherence rules:

impl SerializeAs<Hex> for String { .. } // would not compile
impl SerializeAs<String> for Hex { .. }

The above is compatible with the order of From trait.

This is a big change and I am not sure whether this is preferred way of implementing this feature. If it is, I definitely want to take this forward. But I will need help, some guidance.

I thought about externalizing this change to serde-as crate, but I think this would reduce the adoption.

SameAs

There is one caveat, though. If leaf type does not change, it does not compile. The example with HashMap is an instance of this problem, where i32 key does not change.

The problem arises, because we can't implement:

impl<T: Serialize> SerializeAs<T> for T { .. }

We can't implement it, because the compiler cannot prove the implementations are not overlapping.

That's why I introduced SameAs structure, as a workaround.

There are several solutions:

  • resolve https://github.com/rust-lang/rfcs/issues/1834
  • implement traits for for all types manually
  • the most viable solution right now:
    • move SameAs to private module
    • detect equivalence of leaf types in procedural macro (is it even possible to parse generic types ?). I know this will easily break as we cannot assume String is equal to std::string::String, but the chance one would write different types is low.
  • make SameAs truly public and enforce everyone to write serde(as = HashMap<SameAs<i32>, ..>)

markazmierczak avatar Dec 07 '19 08:12 markazmierczak

This looks promising! I like how well it composes for nested types.

I don't necessarily want to start by landing new traits and derive attributes in serde immediately. It would be better if all of this could be provided in a separate crate for now and we can iterate on the design.

For the attributes maybe you can rely on with for now:

use ???::As;

...

#[serde(with = "As<Vec<Hex>>")]

where As is a generic type that has serialize/deserialize methods as expected by with defined in terms of SerializeAs/DeserializeAs.

If someone already has a deserialize_with function (such as this / this, drawing arbitrarily from GitHub search results), could you show what they would need to write to make it work within this approach?

dtolnay avatar Dec 14 '19 00:12 dtolnay

As struct is great idea, I've tested it and there is only one little quirk. It will be written as #[serde(with = "As::<Vec::<Hex>>")], because serde expects path in this position. I am going to write a new crate, yes!

The referenced stringly_array_spaces function could be rewritten without the use of Visitor. So let me do it first:

pub fn stringly_array_spaces<'de, D>(deserializer: D) -> Result<Vec<String>, D::Error>
where
    D: Deserializer<'de>,
{
    let s: &str = Deserialize::deserialize(deserializer)?;
    Ok(s.split_whitespace().map(|x| x.to_owned()).collect())
}

Translating this directly as it is to DeserializeAs would give us:

struct SpaceSeparatedStrings;

impl<'de> DeserializeAs<'de, Vec<String>> for SpaceSeparatedStrings {
    fn deserialize_as<D>(deserializer: D) -> Result<Vec<String>, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s: &str = Deserialize::deserialize(deserializer)?;
        Ok(s.split_whitespace().map(|x| x.to_owned()).collect())
    }
}

This only give us the ability to nest Vec<String> inside containers, e.g. Option.

I would like to generalize this to take any parsable type (a type that implements FromStr).

struct SpaceSeparated<T>(PhantomData<T>);

impl<'de, T> DeserializeAs<'de, Vec<T>> for SpaceSeparated<T>
where
    T: std::str::FromStr,
    T::Err: fmt::Display,
{
    fn deserialize_as<D>(deserializer: D) -> Result<Vec<T>, D::Error>
    where
        D: Deserializer<'de>,
    {
        let s: &str = Deserialize::deserialize(deserializer)?;
        s.split_whitespace()
            .map(|x| T::from_str(x).map_err(<D::Error as serde::de::Error>::custom))
            .collect()
    }
}

But what I'm really missing here is a kind of FromStrAs to be able to fully utilize this idea. Because, all we got now is only ability for outer nesting (inside other containers), but there is no way to redefine inner type, e.g.:

struct MyStruct {
    #[serde(as = "SpaceSeparated<Hex>")]
    member: Vec<Vec<u8>>,
}

As the consequence SpaceSeparated, as it stands now, doesn't even need to be parametrized by T.

markazmierczak avatar Dec 15 '19 11:12 markazmierczak

@markazmierczak please publish the crate!

cedric-h avatar Jan 07 '20 20:01 cedric-h

I've been working around this problem for a while by writing deserializer functions for every combination of container-foreign type that I use. The As idea looks very promising!

dbeckwith avatar Jan 09 '20 01:01 dbeckwith

If I understand that right, that sounds pretty neat!

Would it allow me to do this?

// I have `Schedule` type that implements `Display` and `FromStr`.

#[derive(Serialize, Deserialize)]
struct Data {
    #[serde(with = "As<HashMap<Schedule, String>>")]
    cron_jobs: HashMap<String, String>
}

I would love to use that in my project right now. :slightly_smiling_face:

zicklag avatar Feb 08 '20 22:02 zicklag

I tested the code a bit and so far it felt quite nice to use. I'm considering making this the basis for my crate serde_with, in which I already have different serde helpers. I'm tracking the progress at https://github.com/jonasbb/serde_with/issues/87

A couple of notes though. The type SameAs<T> is more complicated than needed. A Same without explicit generics also works, but it might be nice to be more explicit.

While a generic implementation like

impl<T: Serialize> SerializeAs<T> for T { .. }

is not possible, it is possible to implement it for every T manually, thus maybe avoiding the need for SameAs<T>/Same in common cases like

impl SerializeAs<i32> for i32 { .. }

EDIT: serde_with v1.5.0 includes this now in a usable manner.

jonasbb avatar Feb 29 '20 14:02 jonasbb