toml icon indicating copy to clipboard operation
toml copied to clipboard

Support semantic error messages by supporting deserializing spans

Open epage opened this issue 2 years ago • 3 comments

See toml::Spanned for inspiration

epage avatar Jan 24 '22 15:01 epage

(I rolled by this issue because my project uses toml-edit for it's higher-quality serialization, but its low-quality spans for deserialization means I've been experimenting with preferring toml-rs for the other side of the equation, even though toml-edit/easy is explicitly designed to preserve more info, alas!)

I've been using this API in toml-rs in anger a lot this week, so I figured I'd chime in with some details on how it works, and some issues it has:

Spanned defines four magic strings:

pub(crate) const NAME: &str = "$__toml_private_Spanned";
pub(crate) const START: &str = "$__toml_private_start";
pub(crate) const END: &str = "$__toml_private_end";
pub(crate) const VALUE: &str = "$__toml_private_value";

And then uses a custom visitor and does the following:

        let visitor = SpannedVisitor(::std::marker::PhantomData);

        static FIELDS: [&str; 3] = [START, END, VALUE];
        deserializer.deserialize_struct(NAME, &FIELDS, visitor)

The deserializer then, at various points in the code, checks for that exact state and wraps itself in a SpannedDeserializer that emits the 3 magic fields, the first 2 being the span info:

        if name == spanned::NAME && fields == [spanned::START, spanned::END, spanned::VALUE] {
            let start = 0;
            let end = self.input.len();

            let res = visitor.visit_map(SpannedDeserializer {
                phantom_data: PhantomData,
                start: Some(start),
                value: Some(self),
                end: Some(end),
            });
            return res;
        }

That the SpannedVisitor expects:

            fn visit_map<V>(self, mut visitor: V) -> Result<Spanned<T>, V::Error>
            where
                V: de::MapAccess<'de>,
            {
                if visitor.next_key()? != Some(START) {
                    return Err(de::Error::custom("spanned start key not found"));
                }

                let start: usize = visitor.next_value()?;
                ...

I've been trying to use this in anger a fair bit this week and it seems to largely work for "normal" derived types but is more of a mess with custom deserialization. Because it relies on random places in the code detecting the magic fields, if you ever invoke it in a way it wasn't expecting (like variants on the "string or struct" pattern in serde's docs) then you get nasty "expected spanned" from it not creating the SpannedDeserializer and emitting the magic fields.

I'm not sure if those problems are "I'm bad at serde" or "this design is buggy" or "serde makes this impossible to do robustly".

The actual Spanned type also has a pretty suboptimal API. It's very opaque, and you have to explicitly call a method to get the wrapped value, so retroactively adding spans to your existing types is extremely disruptive to code. Thankfully because it communicates with magic fields, you're able to "fork" just Spanned. I did this to make it more like Box<T>, in that it does its best to forward everything from T via Deref and trait impls. All that actual span methods I made into associated functions so you have to do things like Spanned::start(&spanned_val), which is fine since you don't actually care about the span until you want to emit an error.

I am... incredibly bad at serde stuff so I'm not sure if I could help with bringing this stuff up, but here's a brain dump.

Gankra avatar Jun 26 '22 19:06 Gankra

Thanks for the brain dump! These details will be important when implementing this!

epage avatar Jun 28 '22 01:06 epage

Some extra notes from landing this:

  • serde(from = "toml::Spanned<T>") seems to work well as an alternative to get a better API without importing the private details
  • It seemed that my woes were rooted more in #[serde(flatten)] than anything else. #[serde(untagged)] did also seem messed up. basically anything that requires the input to be "extra" buffered makes Spanned break. I worked around this by replacing these kinds of things with a #[serde(try_from = "OuterTypeWithEveryPossibleField")] that manually implements flatten/untagged logic after deserializing all the Spanned more naturally. Not sure if the problem is in serde or Spanned, and whether it's fundamental to the design.

Gankra avatar Jun 28 '22 01:06 Gankra

I'm pretty sure the answer is "serde makes this impossible to do robustly". I looked into span support for serde things a bit ago in https://github.com/serde-rs/serde/issues/1811#issuecomment-1282900892, and as far as I can tell there's no way to propagate span info through serde without resorting to wild hacks like what toml::Spanned was doing previously.

It's possible to extend serde's API to better support spans. Unfortunately, with the designs I've found so far, you generally have two choices:

  • Support propagating spans in a way that allows the deserializer to choose an appropriate span representation, which requires a breaking change to serde's API
  • Bake a particular representation choice into serde's API, which will not be appropriate for all formats

Benjamin-L avatar Nov 09 '22 19:11 Benjamin-L

I wonder if it'd make sense to check that the magic strings are the same instead of checking if they are equal, that means comparing the address rather than contents. Unless I'm missing something this should make it more robust (if anyone for whatever reason uses these strings in the input it shouldn't confuse the parser). If there was some way to prevent the linker from merging same symbols they could all be empty strings which would be even nicer but I didn't find how to do that.

Kixunil avatar Dec 27 '22 14:12 Kixunil

Note that I'm keeping the scope of this issue fairly limited to "do what toml did" so we can make progress towards merging the crates. I'd welcome people to open a new issue with their feedback for further improvement.

epage avatar Jan 13 '23 17:01 epage