`refactor`: introduce structured error system w/ host exception tunneling
Addresses #1377 by introducing a structured error system that supports host exception tunneling. The core CError enum has four variants: Context for wrapping errors with messages (replacing anyhow::Context), HostLang for tunneling host exceptions via Box<dyn HostError>, Client for validation/4xx errors, and Internal for 5xx errors; both of which I believe, capture the backtraces via/ std::backtrace::Backtrace.
On the Python side, PyErrWrapper implements HostError and cerror_to_pyerr() handles the reverse conversion, traversing the context chain and downcasting via Any to extract the original PyErr if present.
Following your suggestion about incremental/iteratively working within the same PR, I'm pushing this first for your review. If this looks good, the next step is migrating the a lot of anyhow::Result usages across the codebase to use CResult. The existing anyhow-based types should/will be preserved for backwards compatibility during that migration.
Edit 1: Tended the changes requested here and here; can proceed with the migration of anyhow::Result usages if it looks good!
Edit 2: Completed the migration of error macros from anyhow::bail!/anyhow::anyhow! to the new structured client_bail!/client_error! and internal_bail!/internal_error! macros across ~35 files.
The new macros create CError with Client/Internal variants that capture backtraces, then convert to anyhow::Error for compatibility with existing return types. This preserves all the structured error semantics (including host exception tunneling for Python) while avoiding a massive signature refactor. The anyhow::Result type remains as-is...
PS: I took, what I believe is a pragmatic approach here - rather than replacing every anyhow::Result<T> with CResult<T> (which would've required touching hundreds of serde impls and function signatures), I wrapped CError inside anyhow::Error. Is this acceptable, or would you prefer a full CResult<T> migration in a follow-up?
Full disclosure: Claude (via Cursor) helped me grep through the (relevant files in the) codebase to identify all the error macro usages. Turns out clicking through search results in an IDE beats wrestling w/ git grepping 🙃
Edit 3: Per this comment, I did what was stated, but also ended up fixing/addding a bunch of other stuff along the way:
What I also ended up doing (because the build kept yelling at me...):
- Fixed Python interop in
py_factory.rs- allPyErrnow tunneled viaError::host(),pythonize/depythonizeerrors converted with.internal(),from_py_futureresults properly handled - Migrated
schema.rs,analyzer.rs,live_updater.rs,fingerprint.rsto use (existing ) structuredErrortype - swappedTryInto<Error = anyhow::Error>bounds, removedanyhow::bail!usages, etc. - Fixed
SharedErrorto wrap our structuredErrorinstead ofanyhow::Error - Fixed some
retryable::runclosures inneo4j.rsthat were trying to use a non-existent error variant... 🤔
What's intentionally left as-is (for future) follow-up PRs)
Some files still use anyhow and that's... intentional? Why I feel:
retryable.rs - The retryable::Error struct seems to wrap around anyhow::Error by design. The loop gathers (appends) context on each attempt and the trait bounds alos expect it. Reworking this would mean redesigning how the whole retry mechanism handles errors.
http.rs - Built around anyhow again. Could migrate but it touches retry logic too? // TODO: Follow-up?
py_utils/src/error.rs (ToResultWithPyTrace) - Call sites immediately convert via .map_err(Error::from) so it works, just... not pretty. Could eventually replace with something that returns our Result<T> and wraps formatted traces in Error::host(). Gradual phase-out?
utils/src/error.rs (the bridge stuff) - these are conversion bridges that let the migration happen incrementally. From<anyhow::Error> is actually required right now since retryable/http still produce anyhow errors. Can remove these once those deps migrate?
TLDR (for edit 3 mainly)
- w/ host tunneling (
Error::host()forPyErr) - Migrated relevant files from
anyhow::bail!to structured macros Error/Resultas primary types (renamed fromCError/CResult)retryable.rsandhttp.rsleft as-is - designed around anyhow, needs separate refactor- Conversion bridges kept for gradual migration
@Haleshot Really fast! Looks very reasonable! Thanks!
Updated the PR description above with "Edits" to reflect the changes made incrementally; awaiting your review :)
About
PS: I took, what I believe is a pragmatic approach here - rather than replacing every
anyhow::Result<T>withCResult<T>(which would've required touching hundreds of serde impls and function signatures), I wrappedCErrorinsideanyhow::Error. Is this acceptable, or would you prefer a fullCResult<T>migration in a follow-up?
There're two parts:
- Error creations. e.g. serde related stuffs. Calling on the existing
bail!andapi_bail!macros also belong to this. - Error propagation. e.g. popping up by
?, with additional contexts, etc.
For 1, I think it's it's OK to handle different parts separately. The final goal is not to use anyhow, but it can happen in multiple steps.
For 2, the problem of using anyhow::Error as wrapper is: the structured information will be lost during propagation (it allows downcast to restore the original structured information, but after calling context(), downcast can no longer restore to the original structured error).
To make the migration easier, I have another idea. A lot of existing crates just use Error as name of their error type, and they also define a specialized Result type. For example, for sqlx (Error, Result). We can do the similar thing:
- Rename
CErrortoError, and add a specializedResulttype. - Update this line in prelude to use our new types instead.
- One minor thing to keep in mind is that we use
anyhow::Ok(...)sometimes (which is a syntax sugar, equivalent toOk::<_, anyhow::Error>(...). They can be avoided by explicitly annotate return type values (example).
In this way, most modules won't need to be touched, if they're just propagating errors rather than creating errors.
What do you think?
On the approach suggested above: I'll go about doing the following:
- Rename
CError→Error,CResult→Resultincocoindex_utils - Update
prelude.rsto use these instead ofanyhow
Also, hope it's fine for putting this (& you) through multiple review rounds 😅 - definitely not intentional (hope you don't mind)! You're also more than welcome to push directly to this branch if you feel I'm not getting it (still making my way about Rust and the codebase.
On the approach suggested above: I'll go about doing the following:
- Rename
CError→Error,CResult→Resultincocoindex_utils- Update
prelude.rsto use these instead ofanyhowAlso, hope it's fine for putting this (& you) through multiple review rounds 😅 - definitely not intentional (hope you don't mind)! You're also more than welcome to push directly to this branch if you feel I'm not getting it (still making my way about Rust and the codebase.
Looks good! No worries, this is a large refactor and it's well expected to have multiple rounds of review :) Thanks a lot for pushing it through!
I will be tending to the CI fails soon. 😓 (even w/ all the local building done)... Edit: Let me know if I went outside the scope of the issue at hand (or if the changes made are not relevant). I made these changes incrementally, locally (after having read the issue outline a few times and the build errors that followed on making the changes as I deemed appropriate to "The Proposed Design").
Would appreciate any help in fixing changes that aren't relevant. I have also made the edit to the PR description here (with Edit 3).
Thanks @Haleshot love your PR!