rustic_core icon indicating copy to clipboard operation
rustic_core copied to clipboard

[DRAFT] refactor(commands/errors): improve error handling in check command

Open simonsan opened this issue 1 year ago • 1 comments

As an idea to iterate on for the error handling in the check command. Essentially, we use an error handling thread that prints the errors and gets the CheckErrorKinds from a channel from the main thread.

Based on #317

simonsan avatar Oct 07 '24 21:10 simonsan

Codecov Report

Attention: Patch coverage is 28.12500% with 46 lines in your changes missing coverage. Please review.

Project coverage is 46.2%. Comparing base (6e989b0) to head (163458b). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/commands/check.rs 25.0% 30 Missing :warning:
crates/core/src/repository.rs 33.3% 16 Missing :warning:
Additional details and impacted files
Files with missing lines Coverage Δ
crates/core/src/error.rs 57.1% <ø> (ø)
crates/core/src/repository.rs 47.5% <33.3%> (-1.1%) :arrow_down:
crates/core/src/commands/check.rs 63.8% <25.0%> (-2.5%) :arrow_down:

... and 15 files with indirect coverage changes

codecov[bot] avatar Oct 07 '24 21:10 codecov[bot]

I think this is mixing two things which should be handled completely independent from each other: Channels and error propagation.

Actually, we don't need a channel if we want to collect any kind of information (be it an array of errors, a bool isErr or whatever else) from a called function/method. For this the return value (or maybe a &mut argument) suffices, is simpler and more performant. We only need a channel to communicate (i.e. send information and this can be an array of errors, a bool or any other result) between threads, i.e. if we spawn threads or use already-spawned threads to do some work.

To be more precise, in this example we could change fn xy(...,err_send: Sender<CheckErrorKind>) -> Result<T> into fn xy(..., is_err: &AtomicBool) -> Result<T> and directly modify is_err or even use fn xy(...) -> Result<(T,bool)> and propagate is_err. This gives exactly the same functionality without using a channel.

IMO there is just one fundamental decision to make: Do we want to return and propagate an error flag or a list of errors?

aawsome avatar Oct 12 '24 05:10 aawsome

And the same applies to warning, where we also have to decide:

  • do we want to just log them and not give the information to the caller?
  • do we want to give a is_warn bool information to the caller?
  • do we want to give a list of warnings to the caller?

aawsome avatar Oct 12 '24 06:10 aawsome

I think this is mixing two things which should be handled completely independent from each other: Channels and error propagation.

Actually, we don't need a channel if we want to collect any kind of information (be it an array of errors, a bool isErr or whatever else) from a called function/method. For this the return value (or maybe a &mut argument) suffices, is simpler and more performant. We only need a channel to communicate (i.e. send information and this can be an array of errors, a bool or any other result) between threads, i.e. if we spawn threads or use already-spawned threads to do some work.

To be more precise, in this example we could change fn xy(...,err_send: Sender<CheckErrorKind>) -> Result<T> into fn xy(..., is_err: &AtomicBool) -> Result<T> and directly modify is_err or even use fn xy(...) -> Result<(T,bool)> and propagate is_err. This gives exactly the same functionality without using a channel.

IMO there is just one fundamental decision to make: Do we want to return and propagate an error flag or a list of errors?

I think collecting all errors and returning them from the check_repository function makes sense, as we are processing a collection of data with multiple ways to fail. Repository::check() itself though, should only return a single error stating that the check failed imho.

I agree, I used the channel here, because I got confused by the parallel iterator. But instead of a for_each, we can just use map and collect the results. Then we can e.g. partition the results and return a RusticResult<Result<() | Vec<RusticWarning>, Vec<RusticError>>>. Where the inner Result is for soft-errors/warnings (that would result in a warning or just a logged error, depending on its severity that we would determine in the callers match). And the outer result a hard-error.

The RusticWarning here should stay internal use only, and never propagate to the user of the lib, it's only for logging purposes and further handling - I would even think, we don't even need it and could log a warning directly where it is important. So we can easily find the spot where the warning was emitted. The Vec<RusticError> should also not become part of the Repository API as the users should not need to handle multiple errors and decide on the severity on their own - for them the RusticResult<()> with a single error should be the contract. So they can rely on either the check being ok, or the check having failed.

That being said, https://github.com/rustic-rs/rustic_core/pull/224#issuecomment-2412143876 my opinion is still varying. The tendency goes to the nested result type though, I think that can be a solution.

In general, I think, that we want to build some kind of generator that produces Result<T, E> values that we can evaluate in the caller. I think that was where the thought of a channel came from for the Errors. But I imagine it more to be a thread running check and putting the results into a channel to be evaluated by the main thread.

simonsan avatar Oct 14 '24 17:10 simonsan