ibc-rs
ibc-rs copied to clipboard
Make core `Error` enums less specific
Follow-up to #255.
Our Error
enums have too many variants. We should drastically condense them following the ideas described here.
### Tasks
- [ ] Remove all host error variants from ibc-rs error enums
- [ ] https://github.com/cosmos/ibc-rs/issues/1249
- [ ] https://github.com/cosmos/ibc-rs/issues/950
A thought: ideally, internal functions that do use an Error
enum would return an Error
enum where the variants capture only the possible errors for that function. For example, verify_delay_passed()
's Error
would only have the TimestampOverflow
,NotEnoughTimeElapsed
, and NotEnoughBlocksElapsed
variants (or less if we don't care to be that specific). This provides better documentation of what can go wrong, and callers won't write code for errors that cannot happen.
In #164, there was some push-back against using anyhow
for our errors. However, looking at projects that use ibc-rs, none of them actually make use of our Error
enum:
-
Nomic's errors are modeled as
String
-
Namada make heavy use of the
Other(String)
error variant, -
Penumbra use
anyhow
in their own implementation of IBC handlers
In other words, programmatic logic based on which error occurred is non-existent in all these projects. This tells me that the most important use case for our errors is aiding in debugging; either host chain implementations, or relayers such as Hermes.
Therefore, I feel like our error system should focus on producing great human-readable error diagnostics, which is exactly what libraries such as anyhow
are designed to do. It seems to me like the best way forward is to ignore the conventional wisdom that libraries shouldn't use anyhow
, and use anyhow::Error
as our main Error
type.
I would love the hear people's thoughts on this!
cc @Farhad-Shabani @romac @yito88 @keppel
My take, Anyhow
has a concise syntax for propagating errors, cleaning up the code look, and is easier to implement since it doesn't care what error types your functions return. Though, this approach (freeform as mentioned) comes at a cost of not having a finely tuned error-handling system. We would be limited in the places where we might need to perform specific actions based on the type of error. It can sometimes even make debugging harder by providing unnecessary info/backtrace and representing all kinds of errors making it tricky to identify the exact causes.
Maybe you have a way of implementing anyhow
in your mind? Perhaps not a bad idea to demo a part of our codebase if we think it works.
And, It would be great if you elaborate more on the limitation we face with the current error enum. Couldn’t we refactor it for a better one?
I see the fact you mentioned that IBC-rs errors are not used well in projects, but I think the main reason is they are still in development (e.g. bunch of unwrap()
in those codebases) and don't have a refined error-handling system yet. Though, that's true for us too - we're still figuring out some of the designs, so our error enum isn't perfect either, but I don't think it necessarily is the reason why our current errors are not widely used.
Actually, I am a bit concerned, as IIRC we just revamped our error system. If we switch to Anyhow
, that would be the second change. For me, It sounds like a more detailed study/proposal is required to make sure! Maybe other folks have excellent insights and ideas to make it an easy choice.
I agree with Fahrad, and believe he has a good point wrt codebases not handling errors properly when under heavy development.
Moreover, I too am not sure that I see how anyhow can improve the quality of the errors reported over custom error enums. Perhaps you could into more details about the problems you are seeing?
I believe I found a good strategy for a better error systems (discussion with @romac heavily influenced the design).
The new system recognizes 2 "error sources":
-
Host errors: the errors controlled by the host, coming from methods
Validation/ExecutionContext
- Protocol errors: the errors that come form IBC handlers' validation (e.g. channel not in the expected state)
Host errors
The host error type will be defined by the host. Specifically, we'd make it an associated type of ValidationContext
:
trait ValidationContext {
type Error;
fn host_timestamp(&self) -> Result<Time, Self::Error>;
}
Protocol errors
The type for protocol errors would roughly be a cleaned up version of our current ContextError
. The crucial differences for ContextError
in this system compared to the current one are:
- we no longer need error variants for host errors in
ContextError
- its purpose is solely to generate clear debugging error messages
- that is, we don't expect users to handle these errors
Top-level error type
Our top-level error type, returned by dispatch()
, would be:
enum Error<E> {
Host(E),
Protocol(CleanedUpContextError),
}
As an example, I would expect the user code to look like:
match dispatch(...) {
Host(err) => /* optionally handle my errors */,
Protocol(err) => /* log errors */
}
Internal logic based on errors from the host
This section borrows the great idea presented in this sled.rs's blog post.
In a few instances we need to perform some logic based on the specific error returned from the host. For example, when the packet commitment is not present during acknowledgement processing, we want to return early without an error. We currently return early for all errors, but ideally we'd distinguish the "packet commitment not present" error from all other errors, and only return early for the "packet commitment not present" error. Here's where we'd use sled's idea. get_packet_commitment
would look like:
fn get_packet_commitment(
&self,
commitment_path: &CommitmentPath,
) -> Result<Result<PacketCommitment, CommitmentError>, Self::Error>;
enum CommitmentError {
NotFound,
}
The benefit of this signature is that our internal code would look like:
// note how the outer `Self::Error` gets propagated out, and we
// match on the inner error that we care about
if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
return Ok(());
};
And that's it! I also expect cleaning up ContextError
to solve #342.
I liked the idea of separating the error from the sources. So, we only have to take care of ours without assuming anything about the hosts! That’s perfect in my view.
There are just a few questions it may help to clarify some of the details:
-
Given that hosts will be free to introduce their desired errors through the associated
Error
type under theValidationContext
, why is it needed to include theHost(E)
variant, and basically having a top-level error type? -
Regarding the "not present" state, I have a different perspective. In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by
None
as a legitimate response from a storage call. Accordingly, users won’t have to deal with the additional overhead of introducing errors like theCommitmentError
. Since there is a single acceptable error variant, we can automate this process in our boundary. I believeResult<Option<Output>, Error>
is the correct signature, and I've expanded my view on this in #607
And a suggestion:
Given that hosts will be free to introduce their desired errors through the associated Error type under the ValidationContext, why is it needed to include the Host(E) variant, and basically having a top-level error type?
The "top-level" type is just the type that validate()
, execute()
and dispatch()
return. We have 2 variants because 2 separates types of errors can occur (ValidationContext::Error
and CleanedUpContextError
), and both can be bubbled up and returned from one of the 3 calls. If you don't have a Host(E)
variant, then how do you return Host
errors?
Accordingly, users won’t have to deal with the additional overhead of introducing errors like the
CommitmentError
. Since there is a single acceptable error variant, we can automate this process in our boundary.
Errors such as CommitmentError
are defined by us, not the user. The user only defines ValidationContext::Error
; we define all the others.
In similar cases, unavailability shouldn't be treated as an error. Instead, it should be represented by
None
as a legitimate response from a storage call.
Returning an error doesn't mean it's "illegitimate". Result<Option<Output>, Error>
is mathematically equivalent to Result<Result<PacketCommitment, CommitmentError>, Self::Error>
(where CommitmentError
is defined with just one variant); they're isomorphic. Choosing one over the other is a subjective decision; a matter of taste. I prefer Result<Result<PacketCommitment, CommitmentError>, Self::Error>
, since it effectively avoids form of boolean blindness (which we could call "Option
blindness"). Basically, reading
if matches!(ctx_a.get_packet_commitment(&commitment_path_on_a)?, CommitmentError::NotFound) {
return Ok(());
};
tells me a ton more about what's going on; I see that the commitment was not found directly at the call site. Compare to your suggested signature:
if ctx_a.get_packet_commitment(&commitment_path_on_a)?.is_none() {
return Ok(());
};
is_none()
doesn't tell me the semantics of None
in this case; I need to go read the docstring of the method to remember what None
means in this case.
So using a Result
doesn't mean the error is "illegitimate"; it's just an equivalent but more readable way of encoding the scenario where the packet commitment is not found.