bevy Our API suggests that panicking should be the default

In spirit of https://github.com/bevyengine/bevy/issues/12660 Adapted from https://github.com/bevyengine/bevy/pull/14268

What problem does this solve or what need does it fill?

I believe it is not controversial to say that when an API offers two similar functions with similar names, the shorter will seem like the default. As such, I believe many people will instinctively gravitate to single and single_mut over get_single and get_single_mut. This means we are subtly pushing users to prefer the implicitly panicking version over the fallible one. This is bad, as it leads to games that may run well on the developers machine but then panic in an edge case. It is currently understood in the wider Rust community that panics should be an exceptional case, a nuclear option for when recovery is impossible. We should do our best to make it as easy to do the right thing by default (See Falling Into The Pit of Success). This means that it should be easier to handle an error by propagating it with ? to a logger or similar than to panic. Our current API does the opposite.

This is an issue for the following APIs:

Query::single
Query::get
World::resource
World::non_send_resource (although eventually this will likely be on App)
All of the other non-deferred entity-component-manipulating APIs listed in https://github.com/bevyengine/bevy/issues/14231

What solution would you like?

Deprecate and remove panicking variants. Improve ergonomics by adding macros for early returns. Using a macro for get_single as an example:

fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
    let window = get_single!(primary_window);
    // Do stuff with the window. If there's not exactly one, we have already returned.
}

Which expands to:

fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
    match primary_window.get_single() {
        Ok(item) => item,
        Err => return default(),
    };
    // Do stuff with the window. If there's not exactly one, we have already returned.
}

Note that returning default() will allow the macro to work with systems that return an Option and get piped into error handling methods.

Similar macros are already being used by some users, as indicated by https://github.com/tbillington/bevy_best_practices?tab=readme-ov-file#getter-macros

Paraphrasing @alice-i-cecile:

I'm also coming around on the macro approach: I'd rather this be a Rust feature where we can just ? in function that return (), but failing that this is better than the endless let else return pattern.

As the engine matures, I'm increasingly against panicking APIs, especially in seemingly innocuous cases like this. While it's nice for prototyping, it's a serious hazard for refactors and production-grade apps.

We should decide on what to do for all of these areas at the same time and make a consistent decision across the board, pulling in both SME-ECS and @cart. I think they're better implemented as separate PRs to avoid extreme size, but they should be shipped in the same cycle if we make the change.

What alternative(s) have you considered?

Do nothing
Leave the API as-is and heavily recommend the get_ variants in documentation
Print a warning when calling panicking variants
Keep the panicking variants and rename them to something clunky like _unchecked
Make the panicking API opt-in via a feature
Leave the API as-is and let the schedule handle panics
Use a #[system] macro that modifies the code to allow us to use ? in it, like in bevy_mod_sysfail

Open Questions

Naming

There is a sentiment that get_ is used by the standard library in general when returning an Option. Quoting @benfrankel:

This isn't true for the standard library: Vec::first, Iterator::next, Path::parent, Error::source, etc.

The get_ prefix is only used to differentiate from a panicking version. If there's no panicking version, there's no need for the get_ prefix. For some reason Bevy has latched onto foo vs get_foo as the only API choice when foo may fail, but it doesn't have to be this way.

In a world where Bevy is not panicking by default and always hands out Options and Results, I believe there is little reason to stick to a get_ prefix. It is unnecessary noise on otherwise very concise functions. As such, I would advise to change Bevy's naming convention and drop it as part of this initiative. This may sound like something for another issue, but I'll argue that it should be discussed at least in the same release cycle. Users will already have to go and touch every instance of mut, single, etc. so I don't want to further annoy them by having to change them yet a second time when we drop the get_ prefix.

I realize that this is more controversial however and I'm fine with leaving it. I just want to point out that this is probably the best chance we will get to change it.

Macro parameters

It would be nice to have some influence over what should be done in the unhappy path. For example, we could have a parameter that controls an error! log:

get_single!(foo, "Oh heck");

which expands to:

match foo.get_single() {
    Ok(item) => item,
    Err => {
        error!("Oh heck");
        return default();
    }
};

Of course, the use of error! is arbitrary here and the user may want to use warn! instead or something else altogether. To accommodate for this, we could pass a closure:

get_single!(foo, || error!("Oh heck"));

Which expands to the same as above. Note that this closure could even dictate what we return. Since error! returns (), this works out for the common case, and a user is free to build their own Error variant. However at that point, I don't know if a macro call is saving much typing work anymore.

One could also combine these variants by using labeled parameters:

get_single!(foo, warn: "Oh heck");
get_single!(bar, error: "Oh heck");
get_single!(baz, returns: || error!("Oh heck"));

This would allow the user to have a terse version for the common case usage and a more verbose one for custom error handling.

`continue`

This discussion has focused on the case where we want an early return. I think that is the common case, at least in my experience. What about continue however? Do we want a _continue variant for all macros? An extra parameter? Is there some kind of Rust hack that turns into either continue or return depending on context? Would that even be what the user expects to happen?

Jul 11 '24 06:07 janhohenheim

Pinging SME-ECS: @JoJoJet @maniwani

Jul 11 '24 06:07 janhohenheim

I agree with this I think, however it would be unfortunate for singleton-style components to become more verbose. single is one of those panicking methods that kind of makes sense, since if you use it with a component type that you only spawn one of it's always correct. Maybe we could remove panicking-by-default and still avoid the verbosity by having a marker trait for singleton components, which would unlock the infallible accessor for that type

Jul 11 '24 06:07 joseph-gio

@JoJoJet you could also do the following, assuming we remove the get_ suffix and allow passing a function to the macro as a second param:

let singleton = single!(singleton); // early return
let singleton = single!(singleton, panic); // panic if there's not exactly one

This would require some panic util fn just like we have for warn, error, etc.

Jul 11 '24 07:07 janhohenheim

The RFC for postfix macros would make this pattern so much nicer. Waiting for a language feature isn't practical, though.

let window = window_query.single().or_return!("optional warning msg");

Jul 11 '24 07:07 benfrankel

At work we started using a slightly different pattern. We use the early return with the fallible functions, but we use a debug_assert so it still panics in debug to help us catch those issues. That way it doesn't panic for our users if something unexpected happens.

Jul 11 '24 17:07 IceSentry

My take is generally still what I discussed here: https://github.com/bevyengine/bevy/issues/12660#issuecomment-2111371565

Specifically, I think we should investigate the viability of the following plan:

Make Result systems feel more automatic / first class (id like to stress that result systems are already possible, they just currently require an addition function call at registration). Ideally it is just as simple as returning Result in an existing system. If this isn't possible (while retaining support for non-result systems) for type-system reasons, investigate further.
```
app.add_systems(Update, window_thing)

fn window_thing(windows: Query<&Window>) -> Result {
  let window = windows.get_single()?;
  OK // this is a const export of Ok(())
}
```
Replace the majority of Option-returning functions with Results, which then enables ? to work seamlessly for pretty much everything:
```
fn world_thing(world: &mut World) -> Result {
  let entity = world.get_entity(SOME_ENTITY)?;
  OK
}
```
Replace panicking nice-name api names (ex: entity()) with the Result-returning variant (and remove the old get_entity() function):
```
fn world_thing(world: &mut World) -> Result {
  let entity = world.entity(SOME_ENTITY)?;
  OK
}
```

Imo it is critical to prove steps (2) and (3) out first and do it "as a single transaction" from a user perspective. Skipping straight to step (3) (while easy) puts the cart before the horse and imo makes Bevy UX worse for a period of time (at best) and at worst, commits us to a path that may not be viable or ideal (due to unknowns not yet discovered). Skipping (2) would mean that ? isn't the "ergonomic one size fits all" solution it needs to be.

I do think the macro approach is an interesting "middle ground", but it feels like we're just re-implementing the ? behavior in a clunkier / less idiomatic / less flexible way. Rather than create a zoo of ways to solve this problem (and encourage people to migrate once to a middle ground solution, then again to the "final" solution), I think I'd prefer to just invest in a single canonical ideal solution.

That being said, if it doesn't already exist, I think it makes a lot of sense to build the macro approach as a 3rd party crate. That seems like an appropriate "quick no bureaucracy" middle ground solution while we sort out the details.

Jul 11 '24 19:07 cart

As a supplement, I think it would also be interesting to investigate an Option extension trait that lets you do this:

fn system() -> Result {
  let list = vec![1, 2];
  let x = list.get(0).unpack()?;
  let y = list.get(1).assume("index 1 should exist")?;
  OK
}

Where unpack() (non panicking unwrap variant) maps Option<T> to Result<T, ContextlessError>. And assume() (non panicking expect variant) maps to Result<T, MessageError>.

Jul 11 '24 19:07 cart

unpack and assume seem very similar to Option::ok_or.

Jul 11 '24 19:07 benfrankel

ok_or is different in that it forces you to select and provide an error type. ok_or() is not a replacement for the "screw it I just want a value" mindset of unwrap()

Jul 11 '24 19:07 cart

Ditto for y = list.ok_or("index 1 should exist")?;. If you omit the ? it will compile, but because &str does not directly implement Error, adding ? would cause a failure to compile in the context of an anyhow-style catch-all error. It would need to be something like list.ok_or(MessageError("index 1 should exist"))?, which defeats the "screw it" mindset.

Jul 11 '24 19:07 cart

IMO better error handling shouldn't block a rename from e.g. single, get_single to single.unwrap, single. It's a footgun today, and it won't require any design work to solve.

Jul 11 '24 19:07 benfrankel

Copying my sentiment from Discord here. The proposed additions to Option are great, but there is already anyhow::Context for exactly this. anyhow is supremely mature and one of the most widely crates in Rust in general, so I think we should take advantage of this and just re-export it. We can also not re-export it and just tell users to use it. But reinventing the wheel here seems rather pointless imo

This is probably food for thought for another issue however

Jul 11 '24 19:07 janhohenheim

(Also just mirroring discord) I do suspect that we may want our own top level general-purpose (anyhow-style catch-all) Bevy error type to allow us to better encapsulate Bevy context (such as system context). However should should definitely consider using anyhow given its maturity and featureset. Worth weighing both options. I agree that Context::context fills the same role as assume() (but not unpack()).

Jul 11 '24 20:07 cart

Replace the majority of Option-returning functions with Results

IMO we can and should start on this today: I've been frustrated by this in the past.

Jul 11 '24 20:07 alice-i-cecile

IMO we can and should start on this today: I've been frustrated by this in the past.

I still think we might want to land this all as a whole, as some people might be relying on the proliferation of Option to use ? in the context of code like fn something(world: &World) -> Option<Something> { world.get_entity(SOME_ENTITY)?; } . We risk breaking people multiple times / forcing them to stop using ? if we do this piecemeal.

Jul 11 '24 20:07 cart

(if you're saying we should start merging these changes now as they come in) If you're just saying "lets break ground now", I'm fully on board :)

Jul 11 '24 20:07 cart

Make Result systems feel more automatic / first class ... Ideally it is just as simple as returning Result in an existing system

Just noting that I previously explored this approach in https://github.com/bevyengine/bevy/pull/8705, but I decided to abandon it becuase it made the stack traces for errors useless, which was worse than just calling unwrap() where the error originates IMO

Jul 11 '24 21:07 joseph-gio

Yep, fine to wait on merging. We should at least get a list for now.

Jul 11 '24 21:07 alice-i-cecile

@JoJoJet its worth calling out that the anyhow impl supports this reasonably (using built in rust features), provided you enable the "backtrace" anyhow cargo feature (and run with the RUST_BACKTRACE=1 environment variable):

use anyhow::Result;
use thiserror::Error;

fn main() {
    let result = foo();
    println!("{:?}", result);
}

fn foo() -> Result<()> {
    bar()?;
    Ok(())
}

fn bar() -> Result<()> {
    let x = vec![1, 2];
    let y = x.get(2).ok_or(E1)?; // backtrace correctly points here
    Ok(())
}

#[derive(Error, Debug)]
#[error("hello")]
struct E1;

Prints:

Err(hello

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /home/cart/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.86/src/backtrace.rs:27:14
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1964:27
   2: rust_playground::x
             at ./src/main.rs:11:13
   3: rust_playground::main
             at ./src/main.rs:5:18
   4: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:159:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
   9: std::panicking::try
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
  10: std::panic::catch_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:48
  12: std::panicking::try::do_call
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
  13: std::panicking::try
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
  14: std::panic::catch_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
  15: std::rt::lang_start_internal
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:20
  16: std::rt::lang_start
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:158:17
  17: main
  18: <unknown>
  19: __libc_start_main
  20: _start)

Jul 11 '24 22:07 cart

We could also consider adding a bevy feature that sets the environment variable for people:

#[cfg(feature = "backtrace")]
std::env::set_var("RUST_BACKTRACE", "1")

Just verified that this works. Then if/when we add a "dev" meta-cargo-feature, we could add "backtrace" to it. Then when people do cargo run --features bevy/dev during development (which I think we'll want to recommend ultimately), it will "just work".

Jul 11 '24 22:07 cart

Ah good point. We could provide a dedicated SystemError type that captures a backtrace when you convert to it via ?. If this type is used for nothing other than system return types, it wouldn't need to implement std::error::Error, so we could provide a blanket From impl for all error types without breaking coherence

Jul 11 '24 23:07 joseph-gio

@JoJoJet I want to point out that that would again increase the amount of code we duplicate from anyhow.
Is there something Bevy-specific we need in our error type that would justify creating it over using anyhow::Error? I suppose Bevy functions could populate some kind of metadata field on the error, but I'm struggling to think of something useful to put there that would not already be handled by the log call printing the underlying error or manually downcasting an inner error.

Jul 12 '24 04:07 janhohenheim

I think it would be kind of weird for us to use anyhow::Error in our public APIs, and I don't think we should allow error types without a backtrace as I know from experience that it is a massive footgun to refactor all of your unwraps into a custom error type, only to later realize that all of your error messages are incredibly annoying to trace back to a line of code. I think we should guard users against this trap by forcing them to use an error type with a backtrace. It could just be anyhow::Error, but that would seem weird to me as opposed to using a bevy-specific error type tailor-made for this purpose. We could also just wrap anyhow internally to avoid duplication

Jul 12 '24 05:07 joseph-gio

Yeah, whatever the solution, usable backtraces are definitely a must-have.

Jul 12 '24 05:07 janhohenheim

I published a crate for the macro approach: https://github.com/benfrankel/tiny_bail.

I used (an earlier version of) these macros during the game jam and they were exceptionally convenient.

Aug 01 '24 08:08 benfrankel

I feel like this issue drifted from the original purpose and either needs a rename or split to more issues.

My few cents are that we can't take the ergonomic hit from unwrapping everything by hand. Either we slightly tweak the existing API to make non-panicking methods easier to access, but still keep the panicking ones. Or we go with the ? approach you guys are discussing, but there seems to be no progress on actually providing that.

The reason I have to write this is, this is causing problems just by existing. Each new API now has to decide whether it should expose panicking methods or not. At the bare minimum this issue needs to resolve how new features should expose their APIs, old or new convention, while the new approach gets developed and applied to previous features. @alice-i-cecile

Sep 17 '24 15:09 MiniaczQ

@MiniaczQ I think Cart's point here is that changing it now is really disruptive to users, and will be really disruptive again once we have better error handling. I personally think this is enough of a footgun / API issue to warrant these consequences, but I see the point.

Or is your suggestion that we simply don't follow the old conventions for new API, but leave the old API intact?

The new error handling sounds like something a working group could develop, if enough people are willing to participate. But I think we have plenty of those currently running already, so I doubt that would be fruitful at this time.

Sep 17 '24 15:09 janhohenheim

My suggestion is that issues need to stop being marked as X-Controversial because they are in any way involved with this issue. The controversy is solely here.

The problem is that it's unclear whether new features should expose panicking APIs or not, and that should be decided right now, because not knowing that is halting progress. And if the API doesn't matter, because there will be a massive breaking change anyways, it's just more of an argument to stop referring this issue in other ones.

Sep 17 '24 15:09 MiniaczQ

I've only seen this issue now. My take is Results are the wrong way to go about this. The original problem of "do we panic or not" comes about from error types already (Option). Adding a more granular error type (Result) with more ways to handle them doesn't remedy that it makes it worse.

The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.

This also makes for nicer breaking changes because users already have it set up so that systems with Res<Foo> don't panic cause they can't really do anything else with them? The diagnostics from those panics are unreliable at best to begin with.

Additionally this kind of default just makes for better graceful behavior for released apps.

Sep 17 '24 15:09 iiYese

The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.

This only resolves errors during resource acquisition, systems can have other types of errors that need to be resolved and it'd be good to have a solution for that too.

Sep 17 '24 16:09 MiniaczQ