serde icon indicating copy to clipboard operation
serde copied to clipboard

Tagged enums should support #[serde(other)]

Open zimond opened this issue 7 years ago • 37 comments

I think tagged enums should also allow #[serde(other)] to mark a enum field as default. This will make the tagged enums much more useful.

Or is there already a way to do this that I don know?

zimond avatar May 02 '17 04:05 zimond

Could you give some example of how you would expect this to work? Please show the enum definition, the input data, and the output value you expect to get from deserializing.

dtolnay avatar May 09 '17 14:05 dtolnay

Say I have an enum

enum Target {
   A(TargetA),
   B(TargetB),
   Others(TargetOthers)
}

When deserializing, { "tag": "A" } will be deserialized to Target::A, { "tag": "blahblah" } should be deserialized to Target::Others

As the deserialization input could be indeed anything, it would be nice to have a default (or fallback) choice.

zimond avatar May 10 '17 02:05 zimond

Cool, I am on board with this. I would love a PR to implement it!

dtolnay avatar May 10 '17 02:05 dtolnay

This would be really great for reaching feature-parity with protocol buffers. In proto3 Java, unknown variants deserialize to a special UNRECOGNIZED variant. This allows new variants to be added backwards-compatibly. Without this feature, it's pretty dangerous to use enums at all in more complex distributed systems.

Example

enum TrafficClass {
    #[serde(other)]
    Unknown,
    Undesired,
    BestEffort,
    Sensitive,
}

In the future I might want to add TrafficClass::Critical, in which case I wouldn't want to break existing systems that haven't been redeployed. In this case those systems would see TrafficClass::Unknown, and they could then decide to handle that however they need to -- maybe defaulting to TrafficClass::BestEffort, or maybe replying with an application-level error that the new traffic class isn't supported yet.

Open Questions

I would assume the most sensible thing would be to only allow this annotation on data-less variants; can you think of a way it would work otherwise? I don't think it'd be wise to try to deserialize the unknown tag's data into some other variant's data.

tikue avatar May 11 '17 09:05 tikue

I would assume the most sensible thing would be to only allow this annotation on data-less variants

I don't know if it makes sense for all deserializers, but for JSON at least, it would be nice to be able to capture either the raw JSON (as a string), or better yet, the serde_json::value::Value not-strongly-typed but still deserialized data.

wfraser avatar Jul 25 '17 03:07 wfraser

My use case is something like this:

#[serde(rename_all = "snake_case")]
pub enum FieldType {
    Noop,
    Hash,
    MaskHost,
    MaskAgent,
    DropField,
    Plugin(String),
}

Where, when parsing JSON, the unrecognised variant would be parsed as Plugin, and the text would be the value of the field.

Also, now that I think about it, a stretch goal would be as parsing with many unrecognised variants, deciding between them using similar, structure and type-based logic as #[serde(untagged)] parsing.

golddranks avatar Sep 13 '17 09:09 golddranks

Any news on that? This would be extremely useful. I have needed that for such a long time already.

iddm avatar Feb 16 '18 08:02 iddm

We are using this workaround until it's supported


#[derive(Serialize, Deserialize, Debug)]
pub enum FieldKind {
    date,
    email,
    menu,

    #[serde(skip_deserializing)]
    unknown,
}

fn deserialize_field_kind<'de, D>(deserializer: D) -> Result<FieldKind, D::Error>
where
    D: Deserializer<'de>,
{
    Ok(FieldKind::deserialize(deserializer).unwrap_or(FieldKind::Unknown))
}

#[derive(Serialize, Deserialize, Debug)]
pub struct Field {
    #[serde(deserialize_with = "deserialize_field_kind")]
    kind: FieldKind,
}

hope that helps

galich avatar Apr 05 '18 18:04 galich

@galich thanks! I'd try implementing it in serde-aux crate.

iddm avatar Apr 05 '18 19:04 iddm

@dtolnay: I might have some bandwidth to implement this.

After a quick look through the serde source, some questions:

  1. There's already a serde(other) from cdfd4455288930ff5cc43033b9d8795758f49563 that's minimally documented. Could you explain what it's used for along with what field_identifier is for? https://github.com/serde-rs/serde-rs.github.io/issues/45

  2. Since serde(other) is taken, I propose using serde(open) to indicate an open union/enum:

enum E {
    A,
    B(u32),
    #[serde(open)]
    C,
}

Are you onboard with that name?

braincore avatar Aug 16 '18 20:08 braincore

  1. A field identifier is the enum Field shown in https://serde.rs/deserialize-struct.html i.e. some Deserialize impl for deserializing keys in a map when deserializing that map as a struct. When there is an other variant it matches any other map key that is not matched to a field of the struct being deserialized. This is the default behavior of derived Deserialize impls for structs with named fields when not using serde(deny_unknown_fields).

  2. Use serde(other) because there is no conflict, the attribute means the same thing in this context as it does in the existing implemented context.

dtolnay avatar Aug 17 '18 04:08 dtolnay

I am fighting this too.

I have:

#[derive(Debug, Eq, PartialEq, Clone, Deserialize, Serialize)]
#[serde(tag = "kind", rename_all="lowercase" )]
pub enum DocumentChangeOperation {
    Create(CreateFile),
    Rename(RenameFile),
    Delete(DeleteFile),

    #[serde(other)]
    Edit(TextDocumentEdit),
}

Where the Edit variant doesn't actually have a kind tag so it fails to deserialize properly.

kjeremy avatar Sep 21 '18 19:09 kjeremy

I am also interested in this feature. I would like this to choose a common default variant when no tag field is available, to only require users to specify a type when using a non-common variant.

I would want the following:

[[binds]]
address = "0.0.0.0:80"

[[binds]]
type = "tls"
address = "0.0.0.0:443"
cert = "/some/file/path"
#[derive(Deserialize)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum Bind {
    #[serde(other)]
    Http {
        address: SocketAddr,
    },
    
    #[serde(alias = "tls")]
    HttpsTls {
        address: SocketAddr,
        cert: PathBuf,
    }
}

but with #[serde(other)] it doesn't currently compile with the error message: #[serde(other)] must be on a unit variant

Is there any progress on this issue?

timvisee avatar Feb 28 '19 13:02 timvisee

Hmm, maybe #[serde(tag = "type", rename_all = "snake_case", default = "http")] would be a better location for that?

mathstuf avatar Mar 01 '19 14:03 mathstuf

any suggestion how I would do workaround for a Vec<FieldKind>?

I tried:

fn deserialize_vec_field_kind<'de, D>(deserializer: D) -> Result<Vec<FieldKind>, D::Error>
    where
        D: Deserializer<'de>,
{
    struct VecFieldKind(PhantomData<fn() -> Vec<FieldKind>>);

    impl<'de> Visitor<'de> for VecFieldKind {
        type Value = Vec<FieldKind>;

        fn expecting(&self, formatter: &mut Formatter) -> fmt::Result {
            formatter.write_str("Array of field kind.")
        }

        fn visit_seq<S>(self, mut seq: S) -> Result<Vec<FieldKind>, S::Error>
            where
                S: SeqAccess<'de>,
        {
            let mut field_kinds: Vec<FieldKind> = Vec::new();

            while let element = seq.next_element().unwrap_or(Some(FieldKind::Unknown)) {
                field_kinds.push(element.unwrap())
            }

            Ok(field_kinds)
        }
    }

    deserializer.deserialize_seq(VecFieldKind(PhantomData))
}

but I end up in an infinite loop.

kunicmarko20 avatar Nov 10 '19 11:11 kunicmarko20

            while let element = seq.next_element().unwrap_or(Some(FieldKind::Unknown)) {
                field_kinds.push(element.unwrap())
            }

You should match on the error state, not just the Option-ness. Something like this structure is what you want:

loop {
    match seq.next_element() {
        Ok(Some(element)) => field_kinds.push(element),
        Ok(None) => break, // end of sequence
        Err(_) => field_kinds.push(FieldKind::Unknown),
    }
}

mathstuf avatar Nov 10 '19 14:11 mathstuf

My bad didn't read that documentation correctly, but even if I add a break on None I still have an infinite loop

kunicmarko20 avatar Nov 10 '19 14:11 kunicmarko20

Hmm. Well, something is making next_element not end on its own. I suggest debugging what all of the next_element returns are. Something has to change when it gets to the end of the element list (assuming there is one, but then, there's your problem :) ).

mathstuf avatar Nov 10 '19 15:11 mathstuf

What is this issue still waiting on? It appears that the original request works today:

#[derive(Deserialize, PartialEq)]
#[serde(tag = "tag")]
enum Target {
   A(()),
   B(()),
   #[serde(other)]
   Others
}

fn main() {
    assert_eq!(Target::Others, from_str::<Target>(r#"{ "tag": "blablah" }"#).unwrap());
}

While we could allow the Others variant to hold data in the future, I think this issue has been addressed.

Plecra avatar May 12 '20 13:05 Plecra

It might need an issue on its own, but this is still not working for externally tagged enums:

#[derive(Deserialize, PartialEq, Debug)]
enum Target {
   A(String),
   B(()),
   #[serde(other)]
   Others
}

fn main() {
    assert_eq!(Target::A(String::from("Somestring")), from_str::<Target>(r#"{ "A": "Somestring" }"#).unwrap());
    assert_eq!(Target::Others, from_str::<Target>(r#"{ "blablah": "blablah" }"#).unwrap());
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=68369e9f98d7f85961f5919dc4ff826e

MathiasKoch avatar May 13 '20 08:05 MathiasKoch

Is there any way to get this fixed for Externally tagged enums too?

arthurprs avatar Jun 18 '20 10:06 arthurprs

While we could allow the Others variant to hold data in the future

That's one of the most important points in this issue.

aldanor avatar Oct 25 '20 23:10 aldanor

Here's a workaround to capture unknown enum variants without needing serde(with) annotations

use std::str::FromStr;
use serde::Deserialize;
use serde::de::{value, Deserializer, IntoDeserializer};


#[derive(Deserialize, Debug)]
#[serde(rename_all = "kebab-case")]
#[serde(remote = "MaybeUnknown")]
enum MaybeUnknown {
    Foo,
    Bar,
    #[serde(skip_deserializing)]
    Unknown(String),
}

impl FromStr for MaybeUnknown {
    type Err = value::Error;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Self::deserialize(s.into_deserializer())
    }
}

impl<'de> Deserialize<'de> for MaybeUnknown {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: Deserializer<'de>
    {
        let s = String::deserialize(deserializer)?;
        let deserialized = Self::from_str(&s).unwrap_or_else(|_| {
            tracing::warn!(message = "unknown-variant", variant = ?s);
            Self::Unknown(s)
        });
        Ok(deserialized)
    }
}
#

Michael-J-Ward avatar Mar 19 '21 20:03 Michael-J-Ward

I'm just a newbie but it seems to me https://github.com/serde-rs/serde/issues/912#issuecomment-803097289 will use the default enum even when the input is clearly tagged for another branch of the enum, and just failed to parse, e.g. due to trying to deserialize a string into an integer.

tv42 avatar May 13 '21 22:05 tv42

Instead of talking about workarounds: what would be a viable path forward to get this feature into serde itself?

piegamesde avatar Aug 10 '21 11:08 piegamesde

I don't think there is one. If you have a look at unmerged PRs and activitiy on those and on recently-closed ones you will see that that serde is effectively feature-frozen.

jplatte avatar Aug 10 '21 11:08 jplatte

Another featureful workaround: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=aabb4e581775d2f3b82d4725ab85448f

TeXitoi avatar Nov 29 '21 13:11 TeXitoi

In my case, I'm hitting this same issue with an Option<SomeEnumType> where the deserialized value doesn't match an enum constant. It'd be nice to tell Serde just to default to None.

Qix- avatar Aug 29 '22 19:08 Qix-

I'm hitting up against this and have tried:

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum ValueEntry {
    A(Vec<String>),
    B(Vec<String>),

    // Provide fallback in case we're deserialising from a newer release
    // of a subcommand that include json that we don't recognise
    //
    // 'other' means that we use this variant if none of the others match.
    // 'ignore_contents' is used as 'other' only works if the entry deserializes to
    // a unit value so we override the deserialization to ignore everything and
    // return a unit value.
    #[serde(other, deserialize_with = "ignore_contents")]
    Unknown,
}

fn ignore_contents<'de, D>(deserializer: D) -> Result<(), D::Error>
where
    D: Deserializer<'de>,
{
    // Ignore any content at this part of the json structure
    deserializer.deserialize_ignored_any(serde::de::IgnoredAny);

    // Return unit as our 'Unknown' variant has no args
    Ok(())
}

I am new-ish to rust and serde so I've no reason to have confidence in this but it seems promising from some early tests so I thought I'd record it before I forget. I only have one kind of entry to test it on at the moment so it might fail for other variations.

IgnoredAny only works for structured input like JSON where it is possible to tell where entries end without needing to know the exact details and expected content of the values themselves.

michaeljones avatar Dec 19 '22 10:12 michaeljones

In my case, I'm getting this problem deserializing to HashMap<MyEnum, f64>. Ideally, if there's not a matching variant for MyEnum, it just wouldn't get loaded into the map.

clarkmcc avatar Jan 24 '23 00:01 clarkmcc