cocoindex icon indicating copy to clipboard operation
cocoindex copied to clipboard

`refactor`: introduce structured error system w/ host exception tunneling

Open Haleshot opened this issue 2 weeks ago • 6 comments

Addresses #1377 by introducing a structured error system that supports host exception tunneling. The core CError enum has four variants: Context for wrapping errors with messages (replacing anyhow::Context), HostLang for tunneling host exceptions via Box<dyn HostError>, Client for validation/4xx errors, and Internal for 5xx errors; both of which I believe, capture the backtraces via/ std::backtrace::Backtrace.

On the Python side, PyErrWrapper implements HostError and cerror_to_pyerr() handles the reverse conversion, traversing the context chain and downcasting via Any to extract the original PyErr if present.

Following your suggestion about incremental/iteratively working within the same PR, I'm pushing this first for your review. If this looks good, the next step is migrating the a lot of anyhow::Result usages across the codebase to use CResult. The existing anyhow-based types should/will be preserved for backwards compatibility during that migration.

Edit 1: Tended the changes requested here and here; can proceed with the migration of anyhow::Result usages if it looks good!

Edit 2: Completed the migration of error macros from anyhow::bail!/anyhow::anyhow! to the new structured client_bail!/client_error! and internal_bail!/internal_error! macros across ~35 files.

The new macros create CError with Client/Internal variants that capture backtraces, then convert to anyhow::Error for compatibility with existing return types. This preserves all the structured error semantics (including host exception tunneling for Python) while avoiding a massive signature refactor. The anyhow::Result type remains as-is...

PS: I took, what I believe is a pragmatic approach here - rather than replacing every anyhow::Result<T> with CResult<T> (which would've required touching hundreds of serde impls and function signatures), I wrapped CError inside anyhow::Error. Is this acceptable, or would you prefer a full CResult<T> migration in a follow-up?

Full disclosure: Claude (via Cursor) helped me grep through the (relevant files in the) codebase to identify all the error macro usages. Turns out clicking through search results in an IDE beats wrestling w/ git grepping 🙃

Edit 3: Per this comment, I did what was stated, but also ended up fixing/addding a bunch of other stuff along the way:

What I also ended up doing (because the build kept yelling at me...):

  • Fixed Python interop in py_factory.rs - all PyErr now tunneled via Error::host(), pythonize/depythonize errors converted with .internal(), from_py_future results properly handled
  • Migrated schema.rs, analyzer.rs, live_updater.rs, fingerprint.rs to use (existing ) structured Error type - swapped TryInto<Error = anyhow::Error> bounds, removed anyhow::bail! usages, etc.
  • Fixed SharedError to wrap our structured Error instead of anyhow::Error
  • Fixed some retryable::run closures in neo4j.rs that were trying to use a non-existent error variant... 🤔
What's intentionally left as-is (for future) follow-up PRs)

Some files still use anyhow and that's... intentional? Why I feel:

retryable.rs - The retryable::Error struct seems to wrap around anyhow::Error by design. The loop gathers (appends) context on each attempt and the trait bounds alos expect it. Reworking this would mean redesigning how the whole retry mechanism handles errors.

http.rs - Built around anyhow again. Could migrate but it touches retry logic too? // TODO: Follow-up?

py_utils/src/error.rs (ToResultWithPyTrace) - Call sites immediately convert via .map_err(Error::from) so it works, just... not pretty. Could eventually replace with something that returns our Result<T> and wraps formatted traces in Error::host(). Gradual phase-out?

utils/src/error.rs (the bridge stuff) - these are conversion bridges that let the migration happen incrementally. From<anyhow::Error> is actually required right now since retryable/http still produce anyhow errors. Can remove these once those deps migrate?

TLDR (for edit 3 mainly)
  • w/ host tunneling (Error::host() for PyErr)
  • Migrated relevant files from anyhow::bail! to structured macros
  • Error/Result as primary types (renamed from CError/CResult)
  • retryable.rs and http.rs left as-is - designed around anyhow, needs separate refactor
  • Conversion bridges kept for gradual migration

Haleshot avatar Dec 13 '25 18:12 Haleshot

@Haleshot Really fast! Looks very reasonable! Thanks!

georgeh0 avatar Dec 13 '25 20:12 georgeh0

Updated the PR description above with "Edits" to reflect the changes made incrementally; awaiting your review :)

Haleshot avatar Dec 14 '25 18:12 Haleshot

About

PS: I took, what I believe is a pragmatic approach here - rather than replacing every anyhow::Result<T> with CResult<T> (which would've required touching hundreds of serde impls and function signatures), I wrapped CError inside anyhow::Error. Is this acceptable, or would you prefer a full CResult<T> migration in a follow-up?

There're two parts:

  1. Error creations. e.g. serde related stuffs. Calling on the existing bail! and api_bail! macros also belong to this.
  2. Error propagation. e.g. popping up by ?, with additional contexts, etc.

For 1, I think it's it's OK to handle different parts separately. The final goal is not to use anyhow, but it can happen in multiple steps.

For 2, the problem of using anyhow::Error as wrapper is: the structured information will be lost during propagation (it allows downcast to restore the original structured information, but after calling context(), downcast can no longer restore to the original structured error).

To make the migration easier, I have another idea. A lot of existing crates just use Error as name of their error type, and they also define a specialized Result type. For example, for sqlx (Error, Result). We can do the similar thing:

  • Rename CError toError, and add a specialized Result type.
  • Update this line in prelude to use our new types instead.
  • One minor thing to keep in mind is that we use anyhow::Ok(...) sometimes (which is a syntax sugar, equivalent to Ok::<_, anyhow::Error>(...). They can be avoided by explicitly annotate return type values (example).

In this way, most modules won't need to be touched, if they're just propagating errors rather than creating errors.

What do you think?

georgeh0 avatar Dec 15 '25 21:12 georgeh0

On the approach suggested above: I'll go about doing the following:

  • Rename CError → Error, CResult → Result in cocoindex_utils
  • Update prelude.rs to use these instead of anyhow

Also, hope it's fine for putting this (& you) through multiple review rounds 😅 - definitely not intentional (hope you don't mind)! You're also more than welcome to push directly to this branch if you feel I'm not getting it (still making my way about Rust and the codebase.

Haleshot avatar Dec 17 '25 17:12 Haleshot

On the approach suggested above: I'll go about doing the following:

  • Rename CError → Error, CResult → Result in cocoindex_utils
  • Update prelude.rs to use these instead of anyhow

Also, hope it's fine for putting this (& you) through multiple review rounds 😅 - definitely not intentional (hope you don't mind)! You're also more than welcome to push directly to this branch if you feel I'm not getting it (still making my way about Rust and the codebase.

Looks good! No worries, this is a large refactor and it's well expected to have multiple rounds of review :) Thanks a lot for pushing it through!

georgeh0 avatar Dec 17 '25 17:12 georgeh0

I will be tending to the CI fails soon. 😓 (even w/ all the local building done)... Edit: Let me know if I went outside the scope of the issue at hand (or if the changes made are not relevant). I made these changes incrementally, locally (after having read the issue outline a few times and the build errors that followed on making the changes as I deemed appropriate to "The Proposed Design").

Would appreciate any help in fixing changes that aren't relevant. I have also made the edit to the PR description here (with Edit 3).

Haleshot avatar Dec 22 '25 05:12 Haleshot

Thanks @Haleshot love your PR!

badmonster0 avatar Dec 25 '25 00:12 badmonster0