databend icon indicating copy to clipboard operation
databend copied to clipboard

RFC: Enhancing Error Handling in Rust with `error-stack`

Open andylokandy opened this issue 1 year ago โ€ข 5 comments

RFC: Enhancing Error Handling in Rust with error-stack

Motivation

Inspired by a post from GreptimeDB [ไธญๆ–‡], we propose a new approach to error handling in Rust to improve user experience. The current popular error handling practices have several drawbacks:

  1. Errors are typically represented as an enum tree and implemented with the From trait. This makes it difficult to track different errors originating from the same source. For instance, distinguishing between ReadIndexError and WriteIndexError from a single io::Error is not straightforward.
  2. Errors lack trace information. While Backtrace can trace errors, it cannot do so across different threads.
  3. Error types are usually added at the crate level, which is not flexible enough. Errors can be smaller than a crate or span multiple crates. For example, Databend only has one layer of error type, which complicates high-level problem reasoning.

Goals

We aim to replace the current error handling practice with error-stack. This crate tracks errors in detail, from the higher levels down to the root cause. An example of an error message generated by error-stack reveals the root cause and the error's progression, including source code locations for each layer:

Error: a fatal error has occurred in the main loop
โ”œโ•ดat src/main.rs:11:51
โ”‚
โ”œโ”€โ–ถ failed to read index file: index.txt
โ”‚   โ•ฐโ•ดat src/main.rs:29:35
โ”‚
โ•ฐโ”€โ–ถ No such file or directory (os error 2)
    โ•ฐโ•ดat src/main.rs:29:3

Due to its mechanism, error-stack can trace errors across threads or async contexts, eliminating the need for async-backtrace.

Best Practices

error-stack introduces a new pattern for error handling. It might seem unfamiliar initially, but it offers significant advantages.

1. Use Structs for Errors

Instead of big-enum error types, define errors using structs:

#[derive(Debug, thiserror::Error)]
#[error("{0}")]
pub struct ExecutorError(pub(crate) String);
  • The error does not contain any information about the source error.
  • The error message is a human-readable string, which is often more informative than an enum variant.

2. Use error_stack::Result

Replace type Result<T> = std::result::Result<T, Error> with error_stack::Result:

use error_stack::Result;

pub fn read_index(path: &str) -> Result<String, ExecutorError> {
    todo!()
}

Explicitly specify the error type to accommodate the next change.

3. Explicit Error Conversion

? no longer implicitly converts error types. Use change_context(error) to explicitly convert result errors:

pub fn read_index(path: &str) -> Result<String, ExecutorError> {
    let file = fs::read(path).change_context(ExecutorError(format!("failed to read index file: {}", path)))?;
    Ok(content)
}

change_context adds a new error layer, including the source code location, while converting the error type from std::io::Error to ExecutorError.

4. Describe Current Function Actions

Focus on what the current function is doing rather than the callee function. Here's an anti-pattern to avoid:

pub fn read_index(path: &str) -> Result<String, ExecutorError> {
    let file = fs::read(path).change_context(ExecutorError(format!("failed to open file: {}", path)))?;
    let content = String::from_utf8(file).change_context(ExecutorError(format!("failed to read utf8")))?;

    Ok(content)
}

// Error: a fatal error has occurred in main loop
// โ”œโ•ดat src/main.rs:11:51
// โ”‚
// โ”œโ”€โ–ถ failed to read utf8      <------ Bad! Duplicated with the souce error
// โ”‚   โ•ฐโ•ดat src/main.rs:28:43
// โ”‚
// โ•ฐโ”€โ–ถ invalid utf-8 sequence of 1 bytes from index 0
//     โ•ฐโ•ดat src/main.rs:28:43

In this anti-pattern, an error in the utf8 conversion results in a message that loses the context of the original operation (reading the index file). Instead, describe the current function's action:

pub fn read_index(path: &str) -> Result<String, ExecutorError> {
    let make_error = || ExecutorError(format!("failed to read index file: {}", path));

    let file = fs::read(path).change_context_lazy(make_error)?;
    let content = String::from_utf8(file).change_context_lazy(make_error)?;

    Ok(content)
}

// Error: a fatal error has occurred in main loop
// โ”œโ•ดat src/main.rs:11:53
// โ”‚
// โ”œโ”€โ–ถ failed to read index file: index.txt     <------ Good! Brings useful information
// โ”‚   โ•ฐโ•ดat src/main.rs:29:35
// โ”‚
// โ•ฐโ”€โ–ถ invalid utf-8 sequence of 1 bytes from index 0
//     โ•ฐโ•ดat src/main.rs:29:35

This approach maintains meaningful error contexts.

5. Consistent Error Messages

Follow these conventions for error messages to maintain uniformity:

  1. Start with "failed to ..." in lowercase, without trailing punctuation.
  2. Place the variable at the end of the sentence after a colon, e.g., failed to parse expression: 1 + 1.
  3. When showing multiple variables, use a complete sentence with variables quoted, e.g., failed to cast expression '1 + 1' to type 'String'.

Migration Plan

Databend currently uses a unified ErrorCode across the workspace. The migration to error-stack will involve:

  1. Defining ErrorCode using error-stack.
  2. Introducing more layered error types in intermediate crates.
  3. Replacing Result with error_stack::Result across the entire workspace.

FAQ

  1. How to report error codes?

    Use Report::attach to attach any type to an error, allowing later retrieval.

  2. Is error-stack expensive?

    error-stack collects error information only when the error is created and when it propagates. Under normal circumstances, its performance is similar to plain Result and is lighter than Backtrace.

  3. How to recover from errors?

    Use Report::downcast_ref to find specific error types within the error stack, enabling recovery from specific errors.

andylokandy avatar Jun 05 '24 16:06 andylokandy

Disclaimer: I invented error-stack and am one of the main authors.

I stumbled across this RFC and I really like the adaption. I'm happy to answer questions to help migrate to error-stack and use it. Also, if you have specific feature requests feel free to reach out and we can see if this is achievable ๐Ÿš€

Btw, on a nightly channel, error-stack uses Error::provide. It's how information can be retrieved without downcasting the Report. Every context (passed to change_context) can provide values and attachments provide themselves. Example usage

TimDiekmann avatar Jul 08 '24 09:07 TimDiekmann

@TimDiekmann Thank you tim! Your error-stack is awesome and helps a lot! But currently we have been stuck by some problems:

  1. We can not implement our own fmt for the Report because some of the information is not public, e.g. the location of a frame.
  2. We can not serialize and deserialize a Report. This prevents us from propagating an error through the network between nodes. I know the limitation here, it's because we can not serialize an opaque error. But at least we should be able to serialize the frames into strings (while being unable to be provide() again) and at the remote node display the frame to user as if there are not serialized at all. This require us the customize and take full control of the formatting process, which is blocked by the first problem.

andylokandy avatar Jul 09 '24 06:07 andylokandy

We can not implement our own fmt for the Report because some of the information is not public, e.g. the location of a frame.

The location is not stored directly on the Report but as an attachment. This means, it's possible to read it directly through a Frame, i.e.:

for frame in report.frames() {
    if let Some(location) = frame.downcast_ref::<Location>() {
        println!("{location}");
    }
}

On a nightly channel, you could also directly iterate all provided values (attachments provide themselves):

for location in report.request_ref::<Location>() {
    println!("{location}");
}

We can not serialize and deserialize a Report

Serialization is possible using the serde feature. Only deserialization is not possible because the types cannot be known.

TimDiekmann avatar Jul 09 '24 07:07 TimDiekmann

@TimDiekmann Thank you. I got it. Our goals is simple: provide user a consistent rendered report, where the report can be sent between nodes, where frames are generated by different nodes.

This can be achieved by using a specialized error struct which materialize the error frame structure, with all frame/attachment/context being display or rendered into strings, and can be sent to other nodes. Also, when the other node receive this specialized error, it pops it up as its source error. And finally, we can render the stack structure using default fmt as if no such materialized error exists.

I believe this could be a convenient util for error-stack, do you think it's possible to implement and provide it in error-stack? Otherwise, I can add it in databend and copy the fmt mod for the same thing.

andylokandy avatar Jul 09 '24 09:07 andylokandy

Yes, I could imagine a kind of anonymized structure being helpful. Of course, this would be lossy, but it would at least be possible to deserialize it. I can also imagine that this would be a helpful utility for error-stack, as you could also implement the standard FMT on it.

Are you happy to file an Issue or discussion? Do you have a specific idea what the struct could look like?

Generally, I think there would be some useful utils in error-stack such as iterating over all frames that belong to a context.

TimDiekmann avatar Jul 09 '24 11:07 TimDiekmann