bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Fallible systems

Open NthTensor opened this issue 1 year ago • 12 comments

Objective

Error handling in bevy is hard. See for reference https://github.com/bevyengine/bevy/issues/11562, https://github.com/bevyengine/bevy/issues/10874 and https://github.com/bevyengine/bevy/issues/12660. The goal of this PR is to make it better, by allowing users to optionally return Result from systems as outlined by Cart in https://github.com/bevyengine/bevy/issues/14275#issuecomment-2223708314.

Solution

This PR introduces a new ScheuleSystem type to represent systems that can be added to schedules. Instances of this type contain either an infallible BoxedSystem<(), ()> or a fallible BoxedSystem<(), Result>. ScheuleSystem implements System<In = (), Out = Result> and replaces all uses of BoxedSystem in schedules. The async executor now receives a result after executing a system, which for infallible systems is always Ok(()). Currently it ignores this result, but more useful error handling could also be implemented.

Aliases for Error and Result have been added to the bevy_ecs prelude, as well as const OK which new users may find more friendly than Ok(()).

Testing

  • I have done next to no testing.

Showcase

The following minimal example prints "hello world" once, then completes.

use bevy::prelude::*;

fn main() {
    App::new().add_systems(Update, hello_world_system).run();
}

fn hello_world_system() -> Result {
    println!("hello world");
    Err("string")?;
    println!("goodbye world");
    OK
}

Migration Guide

This change should be pretty much non-breaking, except for users who have implemented their own custom executors. Those users should use ScheduleSystem in place of BoxedSystem<(), ()> and import the System trait where needed. They can choose to do whatever they wish with the result.

Current Work

  • [x] Fix tests & doc comments
  • [ ] Write more tests
  • [x] Add examples
  • [X] Draft release notes

Draft Release Notes

As of this release, systems can now return results.

First a bit of background: Bevy has hisotrically expected systems to return the empty type (). While this makes sense in the context of the ecs, it's at odds with how error handling is typically done in rust: returning Result::Error to indicate failure, and using the short-circuiting ? operator to propagate that error up the call stack to where it can be properly handled. Users of functional languages will tell you this is called "monadic error handling".

Not being able to return Results from systems left bevy users with a quandry. They could add custom error handling logic to every system, or manually pipe every system into an error handler, or perhaps sidestep the issue with some combination of fallible assignents, logging, macros, and early returns. Often, users would just litter their systems with unwraps and possible panics.

While any one of these approaches might be fine for a particular user, each of them has their own drawbacks, and none makes good use of the language. Serious issues could also arrise when two different crates used by the same project made different choices about error handling.

Now, by returning results, systems can defer error handling to the application itself. It looks like this:

// Previous, handling internally
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let Ok(window) = query.get_single() else {
       return;
   };
   // ... do something to the window here
}

// Previous, handling externally
app.add_systems(my_system.pipe(my_error_handler))
fn my_system(window: Query<&Window>) -> Result<(), impl Error> {
   let window = query.get_single()?;
   // ... do something to the window here
   Ok(())
}

// Previous, panicking
app.add_systems(my_system)
fn my_system(window: Query<&Window>) {
   let window = query.single();
   // ... do something to the window here
}

// Now 
app.add_systems(my_system)
fn my_system(window: Query<&Window>) -> Result {
    let window = query.get_single()?;
    // ... do something to the window here
    Ok(())
}

There are currently some limitations. Systems must either return () or Result<(), Box<dyn Error + Send + Sync + 'static>>, with no in-between. Results are also ignored by default, and though implementing a custom handler is possible, it involves writing your own custom ecs executor (which is not recomended).

Systems should return errors when they cannot perform their normal behavior. In turn, errors returned to the executor while running the schedule will (eventually) be treated as unexpected. Users and library authors should prefer to return errors for anything that disrupts the normal expected behavior of a system, and should only handle expected cases internally.

We have big plans for improving error handling further:

  • Allowing users to change the error handling logic of the default executors.
  • Adding source tracking and optional backtraces to errors.
  • Possibly adding tracing-levels (Error/Warn/Info/Debug/Trace) to errors.
  • Generally making the default error logging more helpful and inteligent.
  • Adding monadic system combininators for fallible systems.
  • Possibly removing all panicking variants from our api.

NthTensor avatar Dec 01 '24 22:12 NthTensor

Even without the actual error handling benefits this provides, just having a more blessed way to use ? in systems will be really nice. I know we can just pipe the Result with current systems, but this will hide that bit of extra boilerplate. This also pairs nicely with making more APIs return Result instead of Option, and also makes panicking variants less important (possibly even removable TBH).

bushrat011899 avatar Dec 01 '24 22:12 bushrat011899

It was mentioned in Discord, but I'll include it here for posterity: with fallible systems getting first-class treatment, there may be room to consider removing the panicking variants of certain functions (e.g., Query::get_entity and Query::entity), since the choice of behaviour could be controlled by a system error handler. This would be a large DX win, since the "proper" methods would get the shorter names, and it'd reduce the API surface area.

bushrat011899 avatar Dec 02 '24 03:12 bushrat011899

there may be room to consider removing the panicking variants of certain functions

That's in line with the third point Cart proposed in https://github.com/bevyengine/bevy/issues/14275#issuecomment-2223708314. He indicated then that it was important to land all the related changes in a single release cycle, and I agree. This PR provides his (1), what (2) and (3) look like is up to @alice-i-cecile and the other designated ecs experts.

NthTensor avatar Dec 02 '24 04:12 NthTensor

as well as const OK which new users may find more friendly than Ok(()).

Why? This just makes the code more jarring compared to the rest of the Rust ecosystem, and more cognitive load to switch between returning nothing vs. returning some value.

It'd make sense if it's something useful like https://docs.rs/anyhow/latest/anyhow/fn.Ok.html

teohhanhui avatar Dec 02 '24 06:12 teohhanhui

This just makes the code more jarring compared to the rest of the Rust ecosystem

In this I am trying to defer to my understanding of Cart's preferences. He uses a const in the linked issue, and I believe has expressed that Ok(()) is sort of confusing and cumbersome. No strong preference here from me really.

NthTensor avatar Dec 02 '24 06:12 NthTensor

It'd make sense if it's something useful like https://docs.rs/anyhow/latest/anyhow/fn.Ok.html

Functionally this is identical to anyhow's Ok function. The difference is since in Bevy a ScheduleSystem can only output an (), the function can instead just be a constant.

It's a problem with Rust's syntax at the end of the day. With the addition of the Try trait, perhaps we'll get an automatic wrapping as a bit of syntactic sugar, but that's up to the language team at the end of the day.

bushrat011899 avatar Dec 02 '24 07:12 bushrat011899

A previous version of this feature (https://github.com/bevyengine/bevy/pull/8705) was closed due to incoherent stack traces. This issue was also brought up in discord. It seems similar to https://github.com/bevyengine/bevy/issues/8638 with logs from systems not including the system name properly.

I think these issues can be resolved (just as they were in anyhow) by defining a new error type and hiding instrumentation/stack-tracing/call-tracking in the From implementation.

This isn't an issue for this pr, since we don't do anything with the returned results; it's just control flow. But it would be good to tackle in follow up.

NthTensor avatar Dec 02 '24 11:12 NthTensor

@NthTensor let me know when you're done the outstanding tasks and take this out of draft. I'll do a final pass and then merge + create follow-up issues for this.

alice-i-cecile avatar Dec 02 '24 15:12 alice-i-cecile

as well as const OK which new users may find more friendly than Ok(()).

Can you please cut this from this PR? I kinda like it, but it's pointlessly controversial here :)

alice-i-cecile avatar Dec 02 '24 15:12 alice-i-cecile

@alice-i-cecile requesting removal of the M-Needs-Release-Note label (added in pr description because we don't have a good place for it yet).

NthTensor avatar Dec 02 '24 21:12 NthTensor

@alice-i-cecile requesting removal of the M-Needs-Release-Note label (added in pr description because we don't have a good place for it yet).

Just like migration guides, we should keep this label around even after they're written for searchability and tooling. We might wan to rename that to be more clear though 🤔

alice-i-cecile avatar Dec 03 '24 15:12 alice-i-cecile

@alice-i-cecile requesting removal of the M-Needs-Release-Note label (added in pr description because we don't have a good place for it yet).

Just like migration guides, we should keep this label around even after they're written for searchability and tooling. We might wan to rename that to be more clear though 🤔

M-#[require(ReleaseNote)]

BenjaminBrienen avatar Dec 04 '24 01:12 BenjaminBrienen

Alright, I added the most basic tests and examples in the world. There will be more to do there when handlers are hooked up. Ready for review.

NthTensor avatar Dec 04 '24 02:12 NthTensor

Might be a bit late on this discussion, but this particular implementation feels a bit intrusive to me. Can we instead implement IntoSystemConfigs on systems that returns Result so that those systems can be added to the schedule like a normal system?

PixelDust22 avatar Dec 27 '24 20:12 PixelDust22

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to https://github.com/bevyengine/bevy-website/issues/1967 if you'd like to help out.

alice-i-cecile avatar Mar 25 '25 20:03 alice-i-cecile