iroh icon indicating copy to clipboard operation
iroh copied to clipboard

Replace anyhow with proper error types

Open matthiasbeyer opened this issue 1 year ago • 7 comments

anyhow is for application error handling, please replace it with dedicated error types in your library crates, so users of these crates can do sane error handling.

matthiasbeyer avatar Sep 22 '24 08:09 matthiasbeyer

Could you be more specific to which error return values you would like to match on for different failure cases?

In general you are right, and as iroh moves towards 1.0 we probably also should migrate to more specific error types. On the other hand we really need this to be guided by application needs rather than enumerate all the current failure types, as that would lead to breaking the API on every release. If we were to use a custom error type right now things would probably be a single opaque error type, which is just like anyhow, but worse.

So far in our own uses we have not needed to distinguish error types, but this doesn't mean there isn't a need! This is why to move this forward we really need specific APIs with specific usecases called out for needing a custom error type. We'd really appreciate your input on this.

flub avatar Sep 23 '24 07:09 flub

So I am just starting out using iroh, and only because IPFS is not usable in Rust, because there is no machine-readable API specification that I can program against.

So all of what the following refers to is very much hacked together in a short time, so take everything with a grain of salt please.

My error type in my application (or rather, in the library that becomes my backend implementation), looks like this:

#[derive(Debug, thiserror::Error)]
pub enum Error {
    #[error("Creating Node")]
    CreatingNode(#[source] anyhow::Error),

    #[error("Spawning Node")]
    SpawningNode(#[source] anyhow::Error),

    #[error("Failed to get reader")]
    GettingReader(#[source] anyhow::Error),

    #[error("Failed to put blob")]
    PutBlobError(#[source] anyhow::Error),

    #[error("Shutdown failed")]
    Shutdown(#[from] anyhow::Error),

    #[error("Error while serializing metadata to JSON")]
    SerdeJson(#[from] serde_json::Error),

    #[error("Error while adding bytes")]
    AddingBytes(#[source] anyhow::Error),

    #[error("Error while reading bytes")]
    ReadingBytes(#[source] anyhow::Error),

    #[error("Error while putting content object")]
    PuttingContent(#[source] Box<Error>),

    #[error("Error while putting metadata object")]
    PuttingMetadata(#[source] Box<Error>),
}

Maybe that helps a bit. As you can probably see from the variant names, half of the cases above are just wrapping errors your API returns. The rest is my application logic errors, that again wrap anyhow::Error because the underlying APIs of yours.

Remember, I am pretty much in the "sketch things up" phase.

matthiasbeyer avatar Sep 23 '24 08:09 matthiasbeyer

Hey @matthiasbeyer, great to see you back after our last encounter,

Last we spoke, the team came to the working conclusion that we'd stick with anyhow pre-1.0, and transition to thiserror as part of a 1.0 release (reference), thankfully, we've made a bunch of progress on iroh in the last year, and are now starting to think seriously about a 1.0 roadmap. From our latest blog post:

We're aiming to publish a 1.0 roadmap in 2025.

Which means (unless others, especially @dignifiedquire disagree): we need to figure out when to transition over to thiserror, and at least have a plan in place in 2025, so maybe we use this issue & your baseline set of errors as a starting point for planning?

b5 avatar Sep 24 '24 23:09 b5

If the rewrite can happen in a day, would it happen earlier? What's blocking it from happening now?

Just looked at the source code. There are 146 invocations of anyhow!. I think this is a translation (i18n) problem, where we need to translate the 146 strings to types, and then group them into bigger error sets like described above. I think I can do this.

Are there any other ways where one can craft a unique anyhow::Error? It's like i18n, where I need to track all strings.

Then it goes on Crowdin, or some other place for translation.

iacore avatar Oct 06 '24 12:10 iacore

Are there any other ways where one can craft a unique anyhow::Error?

Yes. There is this implementation: impl<E> From<E> for Error where E: StdError + Send + Sync + 'static

So everywhere ? in any function that returns anyhow::Result, there might be a conversion from something that impls StdError (that includes anyhow::Error itself) where we may need to add either

  • an additional impl From<...> for IrohError
  • a new error enum variant with a #[from] annotation or
  • a call to .map_err before the ?.

There might be even more issues where code has to be re-achitected when something was assuming just anyhow::Error before, but now needs to handle multiple different error types, thus might need to be abstracted.

Unfortunately it's a little more involved than replacing anyhow! expressions with enum values.

matheus23 avatar Oct 06 '24 12:10 matheus23

  • a new error enum variant with a #[from] annotation or

This is the way to go.

So the immediate plan looks like this:

#[derive(Debug, thiserror::Error)]
pub enum Error {
... 146 variants with existing error strings
... #[from] errors from other libraries
}

Then, we split the 146 variants into different sub errors, or have err.coarse_cause() return a enum that corresponds to the variants in https://github.com/n0-computer/iroh/issues/2741#issuecomment-2367552311.

There might be even more issues where code has to be re-achitected when something was assuming just anyhow::Error before, but now needs to handle multiple different error types, thus might need to be abstracted.

The above design is one error. Breakage is unavoidable.

Any other concerns?

One thing I can think of is that different libraries, e.g. iroh-net might use its own crate error type. So essentially we are doing this for each procedure for each of the crate. We will need to make a flow diagram of this to visualize how different errors and 3rdparty error types coalesce into anyhow::Error.

iacore avatar Oct 06 '24 12:10 iacore

I was pointed here from https://github.com/n0-computer/iroh/discussions/2860 as a user who complained about anyhow::Error usage in the library. One thing I notice from the discussion here seems to be a baseline assumption that there will be one iroh_base::Error object. This is a mistake, it would mean that anyone who calls iroh_net::endpoint::Builder::bind() needs to be prepared to handle iroh_base::Error::MetricsAreDisabled.

A potentially useful approach is to start from public APIs which return anyhow::Error, and convert them into specific error types. For example, iroh_net::endpoint::Builder::bind() can return an iroh_net::endpoint::BindError, which could look like this pre-1.0:

#[derive(Debug, thiserror::Error)]
pub enum BindError {
    #[error("failed to create magic socket: {0}")]
    MagicSocket(anyhow::Error),
    #[error("failed to create crypto config: {0}")]
    CryptoConfig(anyhow::Error),
}

This will need to be refined later on (obviously, since it still contains a bunch of anyhow::Error), but will result in narrow error types covering the specific case that developers may need to handle. Maybe it turns out that all MagicSock errors are actually only std::io::Error (it is; I checked), so that variant can be replaced with an Io variant. Maybe CryptoConfig errors are also only std::io::Error (didn't check), and so the whole enum can be replaced with just std::io::Error.

Also, it's worth considering that some Results are actually just Options (for example, the "metrics are disabled" error I mentioned earlier should just return an Option), and any functions that move self need to return a custom Result that gives the object back to the caller in order to be able to do anything useful with the Result (iroh_net::endpoint::Endpoint::close is an example).

[append]

As a quick bonus, I recall this article from Eric Lippert, a designer of C#, which talks about various kinds of errors. Might be useful when considering what should be an error, a panic, or an Option.

CGamesPlay avatar Oct 29 '24 23:10 CGamesPlay

Hey there, I just raised https://github.com/n0-computer/iroh-gossip/issues/33, but figured I'd add my 0.0124 AUD to the issue.

Normally when I'm writing cli / tui apps, I use color-eyre rather than anyhow (I'm a maintainer of ratatui, so this is pretty often). Color-eyre gives a nicer output, adds more info while handling errors and has a useful panic hook in addition to just providing errors. I noticed that iroh-gossip uses anyhow because the ? operator would not convert the anyhow error to a color-eyre error automatically. This is one of the pain points which would be avoided by moving to custom errors.

In general, I've often used thiserror for lib type stuff, but recently have had a chance to look at Snafu. There seems like there's a nice-ish migration pattern from the anything goes error types (Whatever in snafu), to proper errors. It could be worth considering as an option.

joshka avatar Jan 29 '25 00:01 joshka

Thanks for bringing up snafu, i looked at it a while back, always wanted to look more.

dignifiedquire avatar Jan 31 '25 17:01 dignifiedquire

collecting some thoughts as I work through error handling

  • https://greptime.com/blogs/2024-05-07-error-rust very interesting approach
  • snafu is much more flexible, providing Location and Backtrace options on stable rust, sth that thiserror does not want to provide
  • we already use tracing spans, could look into using https://github.com/tokio-rs/tracing/tree/master/tracing-error

dignifiedquire avatar Feb 17 '25 12:02 dignifiedquire

Error handling has many aspects, so I updated the title to reflect the original request better.

Unfortunately we will not be able to do this in a reasonable manner on stable rust, until someone can figure out proper backtrace propagation when doing custom error types. For more details see https://github.com/n0-computer/iroh/pull/3161#issuecomment-2665946654

dignifiedquire avatar Feb 18 '25 14:02 dignifiedquire

I noticed that iroh-gossip uses anyhow because the ? operator would not convert the anyhow error to a color-eyre error automatically. This is one of the pain points which would be avoided by moving to custom errors.

I would love to help fix this, but unfortunately the current state is that custom error types all loose proper backtrace information, both snafu and thiserror fail to provide backtraces to libraries like eyre, for the underlying error types

dignifiedquire avatar Feb 18 '25 14:02 dignifiedquire

Given that backtrace propagation is being considered a blocking issue (a trade-off for sure, not necessarily the right/wrong choice), I would request that the documentation for fallible methods should be extra careful to document the failure conditions that can happen. As discussed in this task, the key problem with anyhow errors is the lack of ~~ability~~ clarity on how to handle the error programatically. Knowing the failure modes of the fallible methods would at least give the library consumer an idea of how they can handle the error (retry, abort, ignore).

CGamesPlay avatar Feb 18 '25 15:02 CGamesPlay