RFC: Enhancing Error Handling in Rust with `error-stack`
RFC: Enhancing Error Handling in Rust with error-stack
Motivation
Inspired by a post from GreptimeDB [ไธญๆ], we propose a new approach to error handling in Rust to improve user experience. The current popular error handling practices have several drawbacks:
- Errors are typically represented as an enum tree and implemented with the
Fromtrait. This makes it difficult to track different errors originating from the same source. For instance, distinguishing betweenReadIndexErrorandWriteIndexErrorfrom a singleio::Erroris not straightforward. - Errors lack trace information. While
Backtracecan trace errors, it cannot do so across different threads. - Error types are usually added at the crate level, which is not flexible enough. Errors can be smaller than a crate or span multiple crates. For example, Databend only has one layer of error type, which complicates high-level problem reasoning.
Goals
We aim to replace the current error handling practice with error-stack. This crate tracks errors in detail, from the higher levels down to the root cause. An example of an error message generated by error-stack reveals the root cause and the error's progression, including source code locations for each layer:
Error: a fatal error has occurred in the main loop
โโดat src/main.rs:11:51
โ
โโโถ failed to read index file: index.txt
โ โฐโดat src/main.rs:29:35
โ
โฐโโถ No such file or directory (os error 2)
โฐโดat src/main.rs:29:3
Due to its mechanism, error-stack can trace errors across threads or async contexts, eliminating the need for async-backtrace.
Best Practices
error-stack introduces a new pattern for error handling. It might seem unfamiliar initially, but it offers significant advantages.
1. Use Structs for Errors
Instead of big-enum error types, define errors using structs:
#[derive(Debug, thiserror::Error)]
#[error("{0}")]
pub struct ExecutorError(pub(crate) String);
- The error does not contain any information about the source error.
- The error message is a human-readable string, which is often more informative than an enum variant.
2. Use error_stack::Result
Replace type Result<T> = std::result::Result<T, Error> with error_stack::Result:
use error_stack::Result;
pub fn read_index(path: &str) -> Result<String, ExecutorError> {
todo!()
}
Explicitly specify the error type to accommodate the next change.
3. Explicit Error Conversion
? no longer implicitly converts error types. Use change_context(error) to explicitly convert result errors:
pub fn read_index(path: &str) -> Result<String, ExecutorError> {
let file = fs::read(path).change_context(ExecutorError(format!("failed to read index file: {}", path)))?;
Ok(content)
}
change_context adds a new error layer, including the source code location, while converting the error type from std::io::Error to ExecutorError.
4. Describe Current Function Actions
Focus on what the current function is doing rather than the callee function. Here's an anti-pattern to avoid:
pub fn read_index(path: &str) -> Result<String, ExecutorError> {
let file = fs::read(path).change_context(ExecutorError(format!("failed to open file: {}", path)))?;
let content = String::from_utf8(file).change_context(ExecutorError(format!("failed to read utf8")))?;
Ok(content)
}
// Error: a fatal error has occurred in main loop
// โโดat src/main.rs:11:51
// โ
// โโโถ failed to read utf8 <------ Bad! Duplicated with the souce error
// โ โฐโดat src/main.rs:28:43
// โ
// โฐโโถ invalid utf-8 sequence of 1 bytes from index 0
// โฐโดat src/main.rs:28:43
In this anti-pattern, an error in the utf8 conversion results in a message that loses the context of the original operation (reading the index file). Instead, describe the current function's action:
pub fn read_index(path: &str) -> Result<String, ExecutorError> {
let make_error = || ExecutorError(format!("failed to read index file: {}", path));
let file = fs::read(path).change_context_lazy(make_error)?;
let content = String::from_utf8(file).change_context_lazy(make_error)?;
Ok(content)
}
// Error: a fatal error has occurred in main loop
// โโดat src/main.rs:11:53
// โ
// โโโถ failed to read index file: index.txt <------ Good! Brings useful information
// โ โฐโดat src/main.rs:29:35
// โ
// โฐโโถ invalid utf-8 sequence of 1 bytes from index 0
// โฐโดat src/main.rs:29:35
This approach maintains meaningful error contexts.
5. Consistent Error Messages
Follow these conventions for error messages to maintain uniformity:
- Start with "failed to ..." in lowercase, without trailing punctuation.
- Place the variable at the end of the sentence after a colon, e.g.,
failed to parse expression: 1 + 1. - When showing multiple variables, use a complete sentence with variables quoted, e.g.,
failed to cast expression '1 + 1' to type 'String'.
Migration Plan
Databend currently uses a unified ErrorCode across the workspace. The migration to error-stack will involve:
- Defining
ErrorCodeusingerror-stack. - Introducing more layered error types in intermediate crates.
- Replacing
Resultwitherror_stack::Resultacross the entire workspace.
FAQ
-
How to report error codes?
Use
Report::attachto attach any type to an error, allowing later retrieval. -
Is
error-stackexpensive?error-stackcollects error information only when the error is created and when it propagates. Under normal circumstances, its performance is similar to plainResultand is lighter thanBacktrace. -
How to recover from errors?
Use
Report::downcast_refto find specific error types within the error stack, enabling recovery from specific errors.
Disclaimer: I invented error-stack and am one of the main authors.
I stumbled across this RFC and I really like the adaption. I'm happy to answer questions to help migrate to error-stack and use it. Also, if you have specific feature requests feel free to reach out and we can see if this is achievable ๐
Btw, on a nightly channel, error-stack uses Error::provide. It's how information can be retrieved without downcasting the Report. Every context (passed to change_context) can provide values and attachments provide themselves.
Example usage
@TimDiekmann Thank you tim! Your error-stack is awesome and helps a lot! But currently we have been stuck by some problems:
- We can not implement our own fmt for the
Reportbecause some of the information is not public, e.g. the location of a frame. - We can not serialize and deserialize a
Report. This prevents us from propagating an error through the network between nodes. I know the limitation here, it's because we can not serialize an opaque error. But at least we should be able to serialize the frames into strings (while being unable to beprovide()again) and at the remote node display the frame to user as if there are not serialized at all. This require us the customize and take full control of the formatting process, which is blocked by the first problem.
We can not implement our own fmt for the
Reportbecause some of the information is not public, e.g. the location of a frame.
The location is not stored directly on the Report but as an attachment. This means, it's possible to read it directly through a Frame, i.e.:
for frame in report.frames() {
if let Some(location) = frame.downcast_ref::<Location>() {
println!("{location}");
}
}
On a nightly channel, you could also directly iterate all provided values (attachments provide themselves):
for location in report.request_ref::<Location>() {
println!("{location}");
}
We can not serialize and deserialize a Report
Serialization is possible using the serde feature. Only deserialization is not possible because the types cannot be known.
@TimDiekmann Thank you. I got it. Our goals is simple: provide user a consistent rendered report, where the report can be sent between nodes, where frames are generated by different nodes.
This can be achieved by using a specialized error struct which materialize the error frame structure, with all frame/attachment/context being display or rendered into strings, and can be sent to other nodes. Also, when the other node receive this specialized error, it pops it up as its source error. And finally, we can render the stack structure using default fmt as if no such materialized error exists.
I believe this could be a convenient util for error-stack, do you think it's possible to implement and provide it in error-stack? Otherwise, I can add it in databend and copy the fmt mod for the same thing.
Yes, I could imagine a kind of anonymized structure being helpful. Of course, this would be lossy, but it would at least be possible to deserialize it. I can also imagine that this would be a helpful utility for error-stack, as you could also implement the standard FMT on it.
Are you happy to file an Issue or discussion? Do you have a specific idea what the struct could look like?
Generally, I think there would be some useful utils in error-stack such as iterating over all frames that belong to a context.