[Proposal] Progressively replace `anyhow` use in library crates with `thiserror`
💥 Proposal
While beginning to address #2941 and investigating the feasibility of introducing Result in some locations, I've noticed that anyhow::Result is exposed from the snarkvm_console_network_environment::prelude, and is used throughout a lot of the library crates.
While anyhow can be convenient for not having to think about creating new error types, it is not designed for use within libraries or to be exposed in public APIs.
This is a proposal to remove anyhow::Result from the prelude, and to progressively replace the use of anyhow for error handling throughout the VM libraries with thiserror. I've noticed there is already some use of thiserror which is great!
Motivation
The author of thiserror and anyhow distinguishes between the use-cases for the two crates briefly in the thiserror README:
Use thiserror if you care about designing your own dedicated error type(s) so that the caller receives exactly the information that you choose in the event of failure. This most often applies to library-like code. Use Anyhow if you don't care what error type your functions return, you just want it to be easy. This is common in application-like code.
Problems with anyhow in libraries:
- Loss of type information: Downstream code cannot match on specific error variants or exhaustively handle different failure modes. Accessing and handling underlying errors requires downcasting which can fail (e.g. can be a problem when refactoring).
- Poor API boundaries:
anyhow::Erroris essentiallyBox<dyn Error>- it's meant for applications that just propagate errors to users, not libraries that need to expose errors as part of their public API. - Documentation gaps: Error variants aren't visible in generated docs, leaving users guessing about the set of possible failure modes. Ideally, a user can read the error type returned within the
Resultand see all possible failure modes for the API their calling.
Benefits of thiserror in libraries:
- Concrete error enums that can be matched and handled exhaustively in a type-safe manner (i.e. no downcasting required).
- Library users can see the full set of possible failure modes just by looking at the error type.
- Proper error composition through transparent wrapping of underlying errors.
Having nice error types would be great for consumers of the snarkvm libs (like leo tooling) so that we can handle different error variants accordingly (e.g. handling a VM halt vs some deployment failure).
Implementation
This might look something like:
- [ ] Remove
anyhow::Resultfromsnarkvm_console_network_environment::preludeso that it no longer overrides theResulttype from the std prelude. - [ ] When introducing new errors, use
thiserrorto declare concrete types. - [ ] Temporarily provide a
Anyhowvariant in concrete error types where necessary as a temporarily workaround? - [ ] Open a tracking issue for replacing all
anyhowuse within libraries with dedicated error types/variants (could be a good issue for juniors or newcomers).
I have also been slowing doing this, for example, in #3050. For unrecoverable errors, I don't think it matters as much, though, as those will simply be logged or passed to the user.
There are some differing opinions in the Rust community if the use anyhow in libraries is always bad. Generally, I more subscribe to the idea that if you want your error to be machine-parsable, you should use thiserror (or another custom error), and anyhow is fine in other cases. Otherwise, you create a lot of enum variants for something that we will eventually just call to_string() on.
What I find a good approach is to gradually move error types over where it helps downstream projects. For example, #3050 introduced CheckBlockError with concrete errors for the ones that snarkOS may want to handle differently, but there is still a catch-all Other(anyhow::Error) because a lot of the underlying functions return that.
It is also problematic to change function signatures that are already used in other projects, so we have to think about how these two types of errors can co-exist.
Yeah that sounds totally reasonable, agreed! The only thing I'd push back on is:
Otherwise, you create a lot of enum variants for something that we will eventually just call
to_string()on.
As a library author, you can't always anticipate what your users will want to handle gracefully or not. Even just being able to see the full set of failure modes clearly in the generated docs for a library API can be useful for consumers of the library who care about what can go wrong (a type is much clearer than some doc comments that can fall out of date). This is why I think thiserror-by-default is great to have as a policy for libraries.
All that said, I can appreciate there is a balance to be had w.r.t. time spent on errors and the ease of use of anyhow. I think your approach of "gradually move error types over where it helps downstream projects" makes sense at this point, seeing as anyhow is already spread through most of the VM.
Good points. We should definitely aim to use thiserror for new API calls. I am also happy to review any changes you need to make to existing code.
Victor probably has a better intuition if changing existing function signatures will create pain downstream. Worst case, we could add a feature flag that forces the old behavior by converting custom errors into anyhow errors.
Two more thoughts that might be useful in relation to this issue:
- There are some places where we use
std::io::Error::other, e.g., during deserialization. These should really use a custom error type. - Certain parts of
snarkvmare part of the public API solely because they are used in differentsnarkvmcrates, but they may not necessarily be intended for use by external crates. It would help a lot to reduce the size of the public API, so we do not have to worry about breaking third-party crates as much.