avro-rs icon indicating copy to clipboard operation
avro-rs copied to clipboard

Record serialization is sensitive to order of fields in struct

Open collosi opened this issue 6 years ago • 8 comments

I'm not sure this is not intended behavior, but I just got tripped up by this so I thought I'd point it out. If you declare a schema like this:

{
  "type": "record",
  "name": "my_record",
  "fields": [
    {"name": "a", "type": "long"},
    {"name": "b", "type": "string"},
  ]
}

and then try to de/serialize it into a rust struct of this type:

#[derive(Deserialize, Serialize)]
pub struct MyRecord {
  b: String,
  a: i64,
}

You will get an error, "value does not match schema". I'm not sure that should be an error, seems like there's an obvious way to translate from one to another.

The operation that is erroring for me is:

to_avro_datum(&record_scheme, to_value(record)?)?

collosi avatar Jul 25 '18 15:07 collosi

Hmm good point, thanks for raising that. I feel like it should still be an error, as for these two different schemas:

"fields": [
  {"name": "a", "type": "long"},
  {"name": "b", "type": "string"},
]

and

"fields": [
  {"name": "b", "type": "string"},
  {"name": "a", "type": "long"},
]

Avro will encode them differently, and schema resolution will not succeed if writing from one and reading from the other. 🙂


Another limitation is that we currently do not rely on the schema to transform a Serialize-able struct into a Value. This could (probably) be changed if it is really needed, but would make the code a bit less straightforward.

flavray avatar Jul 26 '18 10:07 flavray

Yeah, I understand the point about two differing schema, but I think it may be confusing for people coming from the Rust community where I don't think order of fields on a struct ever matters. I don't know much about other implementations of Serde serializers, but if, for example, many other implementations where able to handle this case, it might be a sticking point.

Either way, I don't think its critical, once you understand what's going on, it's easy to handle. Maybe a note in the documentation somewhere?

collosi avatar Jul 26 '18 13:07 collosi

Avro fields are resolved by name not order, so reordering field order is completely legal according to the Schema Resolution rules. I haven't looked into avro-rs at all so I'm not sure how much work this would be, but this is how it's done in Java for reference. We generate a lot of our Avro schemas, so this is a crucial feature in practice.

EDIT: Also here are their parsing and resolution docs https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/io/parsing/doc-files/parsing.html

samschlegel avatar Oct 30 '18 06:10 samschlegel

I just ran into this - was pretty confusing, if the plan is to stick this - it would be helpful if it was mentioned somewhere.

Pritesh-Patel avatar Feb 23 '19 00:02 Pritesh-Patel

I've changed the record validation loop to this just to remove the ordering constraint, although this implementation isn't the cleanest. This issue makes it difficult to enforce the presence of fields with a common Event struct when pre-existing schemas may not be in the same order, which is something I've been dealing with

            use std::collections::HashMap;
            if fields.len() == record_fields.len() {
                let data_fields: HashMap<String, Value> = record_fields.iter().cloned().collect();

                fields
                        .iter()
                        .try_for_each(|field| {
                            if let Some(value) = data_fields.get(&field.name) {
                                validate(value, &field.schema)
                            } else {
                                Err(Error::from(format!(
                                    "Expected field named {} but could not find it",
                                    field.name
                                )))
                            }
                        })
// …

codehearts avatar Mar 03 '20 01:03 codehearts

It appears that this is also a problem when you want to use #[serde(flatten)] as the Struct that gets spliced will mess up the order of the parent struct, fascinating.

mirosval avatar Jul 20 '21 14:07 mirosval

A PR with a fix at https://github.com/apache/avro/pull/1650

martin-g avatar Apr 18 '22 22:04 martin-g

Looks like https://github.com/apache/avro/pull/1650 was merged - should this be closed?

ctb avatar Jan 01 '23 18:01 ctb