serde-yaml icon indicating copy to clipboard operation
serde-yaml copied to clipboard

wrong line numbers when using "flatten"

Open samuelcolvin opened this issue 6 years ago • 8 comments

When using #[serde(flatten)] the line numbers of errors are wrong.

In the scenario below I'm always getting line 2, I guess because this is the first line of the map where another lives.

I've tried this with serde_json and it appears to work correctly, so I think this is a problem specific to serde_yaml.

If this is not possible to fix, is there any work around to capture and parse all extra elements of a map without using flatten that yields correct line numbers in errors?

Example

#[macro_use]
extern crate serde_derive;
extern crate serde_yaml;

use std::collections::BTreeMap as Map;

#[derive(Debug, Deserialize)]
pub struct Spam {
    a: i32,
    b: i32,
}

#[derive(Debug, Deserialize)]
pub struct Config {
    #[serde(default)]
    foo: Map<String, String>,
    #[serde(default)]
    bar: Map<String, String>,

    #[serde(flatten)]
    spam: Map<String, Spam>,
}

fn main() {
    let input = r#"
whatever:
  a: 123
  b: 456

# this is not line 2!
another:
  c: 789"#;
    let config: Config = match serde_yaml::from_str(input) {
        Ok(c) => c,
        Err(e) => {
            eprintln!("Error parsing {}", e);
            return
        }
    };
    println!("config: {:?}", config);
}

Running cargo run gives:

cargo run
   Compiling proc-macro2 v0.4.27
   Compiling unicode-xid v0.1.0
   Compiling serde v1.0.90
   Compiling syn v0.15.31
   Compiling linked-hash-map v0.5.2
   Compiling dtoa v0.4.3
   Compiling yaml-rust v0.4.3
   Compiling quote v0.6.12
   Compiling serde_derive v1.0.90
   Compiling serde_yaml v0.8.8
   Compiling serde-line-numbers v0.1.0 (/home/samuel/code/serde-line-numbers)
    Finished dev [unoptimized + debuginfo] target(s) in 7.79s
     Running `target/debug/serde-line-numbers`
Error parsing missing field `a` at line 2 column 9

I'm using rustc version rustc 1.34.0 (91856ed52 2019-04-10).

samuelcolvin avatar Apr 16 '19 10:04 samuelcolvin

any news on this?

samuelcolvin avatar Apr 19 '19 15:04 samuelcolvin

This looks like a bug. I don't necessarily know how to fix it but I would accept a PR!

dtolnay avatar Apr 19 '19 17:04 dtolnay

Thanks for the response.

I don't really know where to start, but I'll do some digging if I get the time.

samuelcolvin avatar Apr 19 '19 18:04 samuelcolvin

Ok, I've spent some time investigating, but being new to rust I'm aware I'm in over my head.

The problem is at

https://github.com/dtolnay/serde-yaml/blob/f9509c81c5596f0f2592b8e0c4cdc09c33df3c23/src/de.rs#L898-L909

The marker used in private::fix_marker(err, marker, self.path) is from the beginning of the mapping not where the error happened.

I tried using self.peak() to get the marker, but that just gives you the marker of the end of the mapping which is no better.

Any suggestion on where to go would be great? I have a failing unit test but I'm not that close to a fix.

samuelcolvin avatar May 03 '19 12:05 samuelcolvin

That code doesn't seem like it would be the problem, or else lines would always be wrong when deserializing a map, not just when using flatten.

dtolnay avatar May 03 '19 17:05 dtolnay

Suspect the issue is a particular interaction of serde, serde_yaml, and yaml-rust. Theory:

  1. Serde sees flatten, and asks serde_yaml to deserialize the nested struct in a new session.
  2. serde_yaml's yaml events and markers re-begin, losing the enclosing struct's position offset.

https://github.com/chyh1990/yaml-rust/pull/125 may address this; I haven't yet checked yaml-rust for how Markers are calculated / instantiated.

azriel91 avatar Aug 21 '19 00:08 azriel91

I was trying to find some issues on similar behavior I'm seeing with serde_json (see https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8724346d517e89cb98a054c6cfffd1cb) and came across this one. It looks like the issue is indeed with #[serde(flatten)].

gferon avatar Sep 09 '19 09:09 gferon

It looks like the base problem is with serde's ContentDeserializer. This is used for a few things, including #[serde(flatten)] and untagged enums. Basically, it uses deserialize_any on the original deserializer to generate a format-independent structure, and then drives Deserialize with a new Deserializer.

To work around this in full generality, serde would need some way to allow Deserializers to store state, or would need to forgo ContentDeserializer and ContentRefDeserializer (probably by requiring Deserializers to implement Clone, allowing serde to save their state while backtracking).

For more information, check out serde::private::de::content, as well as how it is used to implement struct flattening and untagged enums.

hexane360 avatar Oct 28 '19 13:10 hexane360