rust-csv icon indicating copy to clipboard operation
rust-csv copied to clipboard

Error Deserializing Struct With `flatten`-ed Tagged Enum

Open nlordell opened this issue 2 years ago • 3 comments

Version

Tested on both v1.1.6 and master branch.

Briefly describe the bug.

Deserializing a struct with a flattened tagged enum field does not seem to work.

Include a complete program demonstrating a problem.

Code:
#![allow(dead_code)]

use serde::de::DeserializeOwned;
use std::fmt::Debug;

mod tagged_enum {
    use serde::Deserialize;

    #[derive(Debug, Deserialize)]
    pub struct Record {
        common: u64,
        #[serde(flatten)]
        kind: Kind,
    }

    #[derive(Debug, Deserialize)]
    #[serde(tag = "kind", content = "parameter", rename_all = "lowercase")]
    enum Kind {
        Foo(u64),
        Bar(bool),
    }
}

mod flattened_struct {
    use serde::Deserialize;

    #[derive(Debug, Deserialize)]
    pub struct Record {
        common: u64,
        #[serde(flatten)]
        inner: Inner,
    }

    #[derive(Debug, Deserialize)]
    struct Inner {
        kind: Kind,
        parameter: Value,
    }

    #[derive(Debug, Deserialize)]
    #[serde(rename_all = "lowercase")]
    enum Kind {
        Foo,
        Bar,
    }

    #[derive(Debug, Deserialize)]
    #[serde(untagged)]
    enum Value {
        Num(u64),
        Bool(bool),
    }
}

const CSV: &[u8] = b"\
common,kind,parameter
1,foo,42
2,bar,true";

fn dbg_records<T>()
where
    T: Debug + DeserializeOwned,
{
    let mut reader = csv::Reader::from_reader(CSV);
    for record in reader.deserialize::<T>() {
        println!("{:#?}", record);
    }
}

fn main() {
    dbg_records::<tagged_enum::Record>(); // Error!
    dbg_records::<flattened_struct::Record>(); // OK.
}

(Playground link).

What is the observed behavior of the code above?

Fails to deserialize with invalid type: byte array, expected "kind", "parameter", or other ignored fields error:
Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 22,
                    line: 2,
                    record: 1,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: byte array, expected \"kind\", \"parameter\", or other ignored fields",
                ),
            },
        },
    ),
)
Err(
    Error(
        Deserialize {
            pos: Some(
                Position {
                    byte: 31,
                    line: 3,
                    record: 2,
                },
            ),
            err: DeserializeError {
                field: None,
                kind: Message(
                    "invalid type: byte array, expected \"kind\", \"parameter\", or other ignored fields",
                ),
            },
        },
    ),
)

What is the expected or desired behavior of the code above?

To correctly deserialize records as they are done in the case of flattened structs (also included in code example).

nlordell avatar May 03 '22 08:05 nlordell

I did a little more digging, using cargo +nightly rustc --profile=check -- -Zunpretty=expanded to check out the generated Deserialize implementation.

It looks like flattened structs generate a "field visitor" type that looks like this:

                    impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
                        type Value = __Field;
                        fn expecting(&self,
                            __formatter: &mut _serde::__private::Formatter)
                            -> _serde::__private::fmt::Result {
                            _serde::__private::Formatter::write_str(__formatter,
                                "variant identifier")
                        }
                        fn visit_u64<__E>(self, __value: u64)
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                0u64 => _serde::__private::Ok(__Field::__field0),
                                1u64 => _serde::__private::Ok(__Field::__field1),
                                _ =>
                                    _serde::__private::Err(_serde::de::Error::invalid_value(_serde::de::Unexpected::Unsigned(__value),
                                            &"variant index 0 <= i < 2")),
                            }
                        }
                        fn visit_str<__E>(self, __value: &str)
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                "foo" => _serde::__private::Ok(__Field::__field0),
                                "bar" => _serde::__private::Ok(__Field::__field1),
                                _ => {
                                    _serde::__private::Err(_serde::de::Error::unknown_variant(__value,
                                            VARIANTS))
                                }
                            }
                        }
                        fn visit_bytes<__E>(self, __value: &[u8])
                            -> _serde::__private::Result<Self::Value, __E> where
                            __E: _serde::de::Error {
                            match __value {
                                b"foo" => _serde::__private::Ok(__Field::__field0),
                                b"bar" => _serde::__private::Ok(__Field::__field1),
                                _ => {
                                    let __value = &_serde::__private::from_utf8_lossy(__value);
                                    _serde::__private::Err(_serde::de::Error::unknown_variant(__value,
                                            VARIANTS))
                                }
                            }
                        }
                    }

Meaning it accepts map keys as:

  • indices starting from 0
  • &str matching the names of the fields
  • &[u8] matching the UTF-8 bytes of the names of the fields

In contrast, tagged enums use a "tag-content-other field visitor":

    impl<'de> Visitor<'de> for TagContentOtherFieldVisitor {
        type Value = TagContentOtherField;

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            write!(
                formatter,
                "{:?}, {:?}, or other ignored fields",
                self.tag, self.content
            )
        }

        fn visit_str<E>(self, field: &str) -> Result<Self::Value, E>
        where
            E: de::Error,
        {
            if field == self.tag {
                Ok(TagContentOtherField::Tag)
            } else if field == self.content {
                Ok(TagContentOtherField::Content)
            } else {
                Ok(TagContentOtherField::Other)
            }
        }
    }

Note that the tagged enum field visitor only accepts &str field names. This is a problem since the deserializer uses "borrowed bytes deserializer" which forwards all deserialization calls to visit_bytes which TagContentOtherFieldVisitor does not implement (or, to be more specific, implements the method by returning an error unconditionally):

https://github.com/BurntSushi/rust-csv/blob/41c71ed353a71526c52633d854466c1619dacae4/src/deserializer.rs#L647

So, I'm not sure how this should be fixed. Should

  • csv do a little UTF-8 decoding dance?
    if let Ok(field) = str::from_utf8(field) {
        seed.deserialize(BorrowedStrDeserializer::new(field)).map(Some)
    } else {
        seed.deserialize(BorrowedBytesDeserializer::new(field)).map(Some)
    }
    
  • serde::TagContentOtherFieldVisitor should also add implementations for visit_bytes?

Personally, the second option looks "cleaner" to me. Not sure what others think.

nlordell avatar May 03 '22 08:05 nlordell

I think I have much smaller example that might be related to this issue.

Nonius avatar May 24 '22 23:05 Nonius