serde_with Consider adding helpers for better deserialize error messages for untagged enums

As described in https://github.com/jonasbb/serde_with/pull/586, serde outputs unsatisfactory messages when an error occurs during untagged enum deserialization.

Recently, the serde project explicitly rejected improving error messages for this situation by collecting deserialization errors, suggesting instead that applications use custom visitors to achieve better diagnostics. Incidentally, serde_with has been identified as a candidate crate to provide reusable functionality to make it easier to create these kinds of custom visitors.

As a result, I'd love to hear if such a feature would be in scope for serde_with. In my view, user-facing applications that parse configuration files will benefit most from this feature, as good error messages enable users to more easily troubleshoot and fix their configuration file syntax.

Aug 04 '23 18:08 AlexTMjugador

I would love to have support for that, but for now, I do not see a way how I can support that. The problem is that the error messages are generated by the Deserialize implementations for each type. That means the only place where improvements can be made is in the derive macro that generates them. Creating a converter type/wrapper, like what most of this crate does, will not work here.

This means, fixing the issue requires forking serde_derive. I would love to help with something like that, but that task of creating that and keeping it close to the serde upstream is too much for me to maintain. There are a bunch of issues with the serde crate which can be addressed by changing/adding to the derive macros. serde_with could benefit from it too, for example by eliminating the need for the serde_as macro, instead the derive macros could directly support the new attributes. Maybe such a fork will manifest at some point. There are some contributors to the serde repo and I saw some talk about the idea on Discord too. Or maybe serde will implement enough features. It was more activity there in the last weeks.

Aug 04 '23 22:08 jonasbb

Thanks for your prompt reply!

This means, fixing the issue requires forking serde_derive. I would love to help with something like that, but that task of creating that and keeping it close to the serde upstream is too much for me to maintain.

I have been maintaining such a fork for over a year now, and I would gladly welcome help with that. However, I don't think that forking is a tenable long-term solution.

Right now, my fork does not build when third-party crates patch the serde_derive dependency in their Cargo.toml to use it, due to recent changes to use precompiled serde_derive binaries, and those changes are so large and young that, quite frankly, I'm on the verge of running out of patience to bother anyone with this tedious work.

So far, serde has not been particularly communicative, supportive or thoughtful about the ease of maintaining this fork, and I don't expect this to improve, especially given their now explicit unwillingness to support collecting error messages when deserializing untagged enums out of the box.

The problem is that the error messages are generated by the Deserialize implementations for each type. That means the only place where improvements can be made is in the derive macro that generates them. Creating a converter type/wrapper, like what most of this crate does, will not work here.

For the reasons stated above, I want to focus my efforts on ditch a fork and build a less brittle, long-term solution instead. This means that we have to play by the rules of serde's API and somehow come up with clever ways to wrap the opaque Deserialize implementation for an untagged enum type in a way that shows better error messages.

I gave this angle a try on this Rust Playground snippet, where I managed to replace the default data did not match any variant of untagged enum error message with a custom one. With some improvements (gathering type info to build error messages, applying or creating the wrapper type when and only when the Deserialize implementation is automatically derived on an untagged enum...), I can see a somewhat clear path to improve error messages with the current serde API.

If we were to implement this functionality, what small, likely acceptable by upstream things would we need serde to add to achieve it? I think that answering this question would lead to a more productive dialogue to achieve the goal of better error messages.

Aug 05 '23 17:08 AlexTMjugador

I have been maintaining such a fork for over a year now, and I would gladly welcome help with that. However, I don't think that forking is a tenable long-term solution.

Are you trying to fork serde+serde_derive? That would be more powerful, but I am uncertain if requiring patching the dependency for every user is worth it. I was more thinking or providing a separate crate with new proc-macros, similar to how serde_repr, serde_tuple, serde-indexed, serde_query, or serde_with all provide proc-macros for implementing Serialize or Deserialize.

I gave this angle a try on this Rust Playground snippet, where I managed to replace the default data did not match any variant of untagged enum error message with a custom one. With some improvements (gathering type info to build error messages, applying or creating the wrapper type when and only when the Deserialize implementation is automatically derived on an untagged enum...), I can see a somewhat clear path to improve error messages with the current serde API.

I am not sure what you are trying to do in the snippet. Your goal is to replace data did not match any variant of untagged enum UntaggedEnum with cannot parse either `A` or `B` for UntaggedEnum? But that still doesn't tell you why deserialization has failed. Overall, PrettyErrorWrapper looks like a complicated variant of serde(expecting = "..."):

#[derive(Deserialize, Debug)]
#[serde(untagged, expecting = "cannot parse either `A` or `B` for UntaggedEnum")]
enum UntaggedEnum {
    A(u64),
    B(String),
}

If a different (static) error message is all you want, you can create a proc-macro, that runs before the derive(Deserialize) and adds the serde(expecting = "...") attribute. All without changes to serde.

Aug 05 '23 18:08 jonasbb

Are you trying to fork serde+serde_derive? That would be more powerful, but I am uncertain if requiring patching the dependency for every user is worth it. I was more thinking or providing a separate crate with new proc-macros, similar to how serde_repr, serde_tuple, serde-indexed, serde_query, or serde_with all provide proc-macros for implementing Serialize or Deserialize.

Yes, that's exactly what I did. Your idea of providing custom proc-macros for deriving Deserialize in separate crates sounds much better, though!

I am not sure what you are trying to do in the snippet. Your goal is to replace data did not match any variant of untagged enum UntaggedEnum with cannot parse either `A` or `B` for UntaggedEnum? But that still doesn't tell you why deserialization has failed. Overall, PrettyErrorWrapper looks like a complicated variant of serde(expecting = "..."): [...]

With that snippet I was trying to bring forward a general way to improve error messages. The fact that it replaces the error message with a constant string is incidental: the code could easily be modified to display a non-constant string, unlike with the expecting attribute (which, by the way, I didn't know about because it's undocumented, so thank you a bunch for bringing it to my attention!).

Ideally, I would like serde-with (or some other crate) to provide an ergonomic way for applications to generate more helpful, context-aware error messages when data can't be deserialized into any variant of an untagged enum. The way I see it, this can be achieved by:

Collecting the deserialization errors for each variant into a single error for the entire enum, like several PRs for serde attempted. (This does not seem possible to do with the current serde API without custom Deserialize implementations, though. Custom proc-macro derived Deserialize implementations may also leverage custom attributes to offer the customization described in the next point.)
Introducing builders or macros to create custom Deserialize wrappers to customize error messages with better context information (e.g., the possible variants, choosing whether to show the error for the variant where most fields were deserialized...). In other words, "productionizing" the snippet I shared.

As a result, I suppose this issue boils down to whether it would be nice for serde-with to either offer a new proc-macro to derive Deserialize or new wrapper types so that it shows better errors for untagged enums. For either of these approaches (or other approaches you might come up with!), what's the missing piece to get this done? I would love to help in some way with this.

Aug 05 '23 20:08 AlexTMjugador

I do not see a way forward with wrappers that goes beyond a static message. There is no way for wrappers to get any information about why deserialization fails. Not without changing the Deserialize implementation of the untagged enum. You can already get perfect error messages with a handwritten Deserialize implementation. Writing a proc-macro to do the same is possible, but will either be limited (no support for many of the serde attributes) or it ends up a fork of serde_derive. As already explained, I am unable to maintain such a derive macro.

If you find a way to make the wrappers emit a dynamic message explaining what failed, then I would very much like to see that.

serde attributes are indeed not always documented, but expecting is listed under container attributes.

Aug 05 '23 22:08 jonasbb

If you find a way to make the wrappers emit a dynamic message explaining what failed, then I would very much like to see that.

I understand. I will let you know if I find some way to do it!

serde attributes are indeed not always documented, but expecting is listed under container attributes.

That attribute has just been documented moments ago thanks to a PR I submitted :wink:

Aug 05 '23 23:08 AlexTMjugador

serde_with serde_with copied to clipboard

Consider adding helpers for better deserialize error messages for untagged enums

serde_with
serde_with copied to clipboard