Our API suggests that panicking should be the default
In spirit of https://github.com/bevyengine/bevy/issues/12660 Adapted from https://github.com/bevyengine/bevy/pull/14268
What problem does this solve or what need does it fill?
I believe it is not controversial to say that when an API offers two similar functions with similar names, the shorter will seem like the default. As such, I believe many people will instinctively gravitate to single and single_mut over get_single and get_single_mut. This means we are subtly pushing users to prefer the implicitly panicking version over the fallible one. This is bad, as it leads to games that may run well on the developers machine but then panic in an edge case.
It is currently understood in the wider Rust community that panics should be an exceptional case, a nuclear option for when recovery is impossible.
We should do our best to make it as easy to do the right thing by default (See Falling Into The Pit of Success). This means that it should be easier to handle an error by propagating it with ? to a logger or similar than to panic. Our current API does the opposite.
This is an issue for the following APIs:
Query::singleQuery::getWorld::resourceWorld::non_send_resource(although eventually this will likely be on App)- All of the other non-deferred entity-component-manipulating APIs listed in https://github.com/bevyengine/bevy/issues/14231
What solution would you like?
Deprecate and remove panicking variants. Improve ergonomics by adding macros for early returns. Using a macro for get_single as an example:
fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
let window = get_single!(primary_window);
// Do stuff with the window. If there's not exactly one, we have already returned.
}
Which expands to:
fn do_stuff(primary_window: Query<&Window, With<PrimaryWindow>>) {
match primary_window.get_single() {
Ok(item) => item,
Err => return default(),
};
// Do stuff with the window. If there's not exactly one, we have already returned.
}
Note that returning default() will allow the macro to work with systems that return an Option and get piped into error handling methods.
Similar macros are already being used by some users, as indicated by https://github.com/tbillington/bevy_best_practices?tab=readme-ov-file#getter-macros
Paraphrasing @alice-i-cecile:
I'm also coming around on the macro approach: I'd rather this be a Rust feature where we can just ? in function that return (), but failing that this is better than the endless let else return pattern.
As the engine matures, I'm increasingly against panicking APIs, especially in seemingly innocuous cases like this. While it's nice for prototyping, it's a serious hazard for refactors and production-grade apps.
We should decide on what to do for all of these areas at the same time and make a consistent decision across the board, pulling in both SME-ECS and @cart. I think they're better implemented as separate PRs to avoid extreme size, but they should be shipped in the same cycle if we make the change.
What alternative(s) have you considered?
- Do nothing
- Leave the API as-is and heavily recommend the
get_variants in documentation - Print a warning when calling panicking variants
- Keep the panicking variants and rename them to something clunky like
_unchecked - Make the panicking API opt-in via a feature
- Leave the API as-is and let the schedule handle panics
- Use a
#[system]macro that modifies the code to allow us to use?in it, like in bevy_mod_sysfail
Open Questions
Naming
There is a sentiment that get_ is used by the standard library in general when returning an Option.
Quoting @benfrankel:
This isn't true for the standard library:
Vec::first,Iterator::next,Path::parent,Error::source, etc.The g
et_prefix is only used to differentiate from a panicking version. If there's no panicking version, there's no need for theget_prefix. For some reason Bevy has latched ontofoovsget_fooas the only API choice when foo may fail, but it doesn't have to be this way.
In a world where Bevy is not panicking by default and always hands out Options and Results, I believe there is little reason to stick to a get_ prefix. It is unnecessary noise on otherwise very concise functions. As such, I would advise to change Bevy's naming convention and drop it as part of this initiative. This may sound like something for another issue, but I'll argue that it should be discussed at least in the same release cycle. Users will already have to go and touch every instance of mut, single, etc. so I don't want to further annoy them by having to change them yet a second time when we drop the get_ prefix.
I realize that this is more controversial however and I'm fine with leaving it. I just want to point out that this is probably the best chance we will get to change it.
Macro parameters
It would be nice to have some influence over what should be done in the unhappy path. For example, we could have a parameter that controls an error! log:
get_single!(foo, "Oh heck");
which expands to:
match foo.get_single() {
Ok(item) => item,
Err => {
error!("Oh heck");
return default();
}
};
Of course, the use of error! is arbitrary here and the user may want to use warn! instead or something else altogether. To accommodate for this, we could pass a closure:
get_single!(foo, || error!("Oh heck"));
Which expands to the same as above. Note that this closure could even dictate what we return. Since error! returns (), this works out for the common case, and a user is free to build their own Error variant. However at that point, I don't know if a macro call is saving much typing work anymore.
One could also combine these variants by using labeled parameters:
get_single!(foo, warn: "Oh heck");
get_single!(bar, error: "Oh heck");
get_single!(baz, returns: || error!("Oh heck"));
This would allow the user to have a terse version for the common case usage and a more verbose one for custom error handling.
continue
This discussion has focused on the case where we want an early return. I think that is the common case, at least in my experience. What about continue however? Do we want a _continue variant for all macros? An extra parameter? Is there some kind of Rust hack that turns into either continue or return depending on context? Would that even be what the user expects to happen?
Pinging SME-ECS: @JoJoJet @maniwani
I agree with this I think, however it would be unfortunate for singleton-style components to become more verbose. single is one of those panicking methods that kind of makes sense, since if you use it with a component type that you only spawn one of it's always correct.
Maybe we could remove panicking-by-default and still avoid the verbosity by having a marker trait for singleton components, which would unlock the infallible accessor for that type
@JoJoJet you could also do the following, assuming we remove the get_ suffix and allow passing a function to the macro as a second param:
let singleton = single!(singleton); // early return
let singleton = single!(singleton, panic); // panic if there's not exactly one
This would require some panic util fn just like we have for warn, error, etc.
The RFC for postfix macros would make this pattern so much nicer. Waiting for a language feature isn't practical, though.
let window = window_query.single().or_return!("optional warning msg");
At work we started using a slightly different pattern. We use the early return with the fallible functions, but we use a debug_assert so it still panics in debug to help us catch those issues. That way it doesn't panic for our users if something unexpected happens.
My take is generally still what I discussed here: https://github.com/bevyengine/bevy/issues/12660#issuecomment-2111371565
Specifically, I think we should investigate the viability of the following plan:
- Make Result systems feel more automatic / first class (id like to stress that result systems are already possible, they just currently require an addition function call at registration). Ideally it is just as simple as returning Result in an existing system. If this isn't possible (while retaining support for non-result systems) for type-system reasons, investigate further.
app.add_systems(Update, window_thing) fn window_thing(windows: Query<&Window>) -> Result { let window = windows.get_single()?; OK // this is a const export of Ok(()) } - Replace the majority of Option-returning functions with Results, which then enables
?to work seamlessly for pretty much everything:fn world_thing(world: &mut World) -> Result { let entity = world.get_entity(SOME_ENTITY)?; OK } - Replace panicking nice-name api names (ex:
entity()) with the Result-returning variant (and remove the oldget_entity()function):fn world_thing(world: &mut World) -> Result { let entity = world.entity(SOME_ENTITY)?; OK }
Imo it is critical to prove steps (2) and (3) out first and do it "as a single transaction" from a user perspective. Skipping straight to step (3) (while easy) puts the cart before the horse and imo makes Bevy UX worse for a period of time (at best) and at worst, commits us to a path that may not be viable or ideal (due to unknowns not yet discovered). Skipping (2) would mean that ? isn't the "ergonomic one size fits all" solution it needs to be.
I do think the macro approach is an interesting "middle ground", but it feels like we're just re-implementing the ? behavior in a clunkier / less idiomatic / less flexible way. Rather than create a zoo of ways to solve this problem (and encourage people to migrate once to a middle ground solution, then again to the "final" solution), I think I'd prefer to just invest in a single canonical ideal solution.
That being said, if it doesn't already exist, I think it makes a lot of sense to build the macro approach as a 3rd party crate. That seems like an appropriate "quick no bureaucracy" middle ground solution while we sort out the details.
As a supplement, I think it would also be interesting to investigate an Option extension trait that lets you do this:
fn system() -> Result {
let list = vec![1, 2];
let x = list.get(0).unpack()?;
let y = list.get(1).assume("index 1 should exist")?;
OK
}
Where unpack() (non panicking unwrap variant) maps Option<T> to Result<T, ContextlessError>. And assume() (non panicking expect variant) maps to Result<T, MessageError>.
unpack and assume seem very similar to Option::ok_or.
ok_or is different in that it forces you to select and provide an error type. ok_or() is not a replacement for the "screw it I just want a value" mindset of unwrap()
Ditto for y = list.ok_or("index 1 should exist")?;. If you omit the ? it will compile, but because &str does not directly implement Error, adding ? would cause a failure to compile in the context of an anyhow-style catch-all error. It would need to be something like list.ok_or(MessageError("index 1 should exist"))?, which defeats the "screw it" mindset.
IMO better error handling shouldn't block a rename from e.g. single, get_single to single.unwrap, single. It's a footgun today, and it won't require any design work to solve.
Copying my sentiment from Discord here. The proposed additions to Option are great, but there is already anyhow::Context for exactly this.
anyhow is supremely mature and one of the most widely crates in Rust in general, so I think we should take advantage of this and just re-export it. We can also not re-export it and just tell users to use it.
But reinventing the wheel here seems rather pointless imo
This is probably food for thought for another issue however
(Also just mirroring discord) I do suspect that we may want our own top level general-purpose (anyhow-style catch-all) Bevy error type to allow us to better encapsulate Bevy context (such as system context). However should should definitely consider using anyhow given its maturity and featureset. Worth weighing both options. I agree that Context::context fills the same role as assume() (but not unpack()).
Replace the majority of Option-returning functions with Results
IMO we can and should start on this today: I've been frustrated by this in the past.
IMO we can and should start on this today: I've been frustrated by this in the past.
I still think we might want to land this all as a whole, as some people might be relying on the proliferation of Option to use ? in the context of code like fn something(world: &World) -> Option<Something> { world.get_entity(SOME_ENTITY)?; } . We risk breaking people multiple times / forcing them to stop using ? if we do this piecemeal.
(if you're saying we should start merging these changes now as they come in) If you're just saying "lets break ground now", I'm fully on board :)
- Make Result systems feel more automatic / first class ... Ideally it is just as simple as returning Result in an existing system
Just noting that I previously explored this approach in https://github.com/bevyengine/bevy/pull/8705, but I decided to abandon it becuase it made the stack traces for errors useless, which was worse than just calling unwrap() where the error originates IMO
Yep, fine to wait on merging. We should at least get a list for now.
@JoJoJet its worth calling out that the anyhow impl supports this reasonably (using built in rust features), provided you enable the "backtrace" anyhow cargo feature (and run with the RUST_BACKTRACE=1 environment variable):
use anyhow::Result;
use thiserror::Error;
fn main() {
let result = foo();
println!("{:?}", result);
}
fn foo() -> Result<()> {
bar()?;
Ok(())
}
fn bar() -> Result<()> {
let x = vec![1, 2];
let y = x.get(2).ok_or(E1)?; // backtrace correctly points here
Ok(())
}
#[derive(Error, Debug)]
#[error("hello")]
struct E1;
Prints:
Err(hello
Stack backtrace:
0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
at /home/cart/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.86/src/backtrace.rs:27:14
1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/result.rs:1964:27
2: rust_playground::x
at ./src/main.rs:11:13
3: rust_playground::main
at ./src/main.rs:5:18
4: core::ops::function::FnOnce::call_once
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
5: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys_common/backtrace.rs:155:18
6: std::rt::lang_start::{{closure}}
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:159:18
7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:284:13
8: std::panicking::try::do_call
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
9: std::panicking::try
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
10: std::panic::catch_unwind
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
11: std::rt::lang_start_internal::{{closure}}
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:48
12: std::panicking::try::do_call
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:559:40
13: std::panicking::try
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:523:19
14: std::panic::catch_unwind
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panic.rs:149:14
15: std::rt::lang_start_internal
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:141:20
16: std::rt::lang_start
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/rt.rs:158:17
17: main
18: <unknown>
19: __libc_start_main
20: _start)
We could also consider adding a bevy feature that sets the environment variable for people:
#[cfg(feature = "backtrace")]
std::env::set_var("RUST_BACKTRACE", "1")
Just verified that this works. Then if/when we add a "dev" meta-cargo-feature, we could add "backtrace" to it. Then when people do cargo run --features bevy/dev during development (which I think we'll want to recommend ultimately), it will "just work".
Ah good point. We could provide a dedicated SystemError type that captures a backtrace when you convert to it via ?. If this type is used for nothing other than system return types, it wouldn't need to implement std::error::Error, so we could provide a blanket From impl for all error types without breaking coherence
@JoJoJet I want to point out that that would again increase the amount of code we duplicate from anyhow.
Is there something Bevy-specific we need in our error type that would justify creating it over using anyhow::Error? I suppose Bevy functions could populate some kind of metadata field on the error, but I'm struggling to think of something useful to put there that would not already be handled by the log call printing the underlying error or manually downcasting an inner error.
I think it would be kind of weird for us to use anyhow::Error in our public APIs, and I don't think we should allow error types without a backtrace as I know from experience that it is a massive footgun to refactor all of your unwraps into a custom error type, only to later realize that all of your error messages are incredibly annoying to trace back to a line of code. I think we should guard users against this trap by forcing them to use an error type with a backtrace. It could just be anyhow::Error, but that would seem weird to me as opposed to using a bevy-specific error type tailor-made for this purpose. We could also just wrap anyhow internally to avoid duplication
Yeah, whatever the solution, usable backtraces are definitely a must-have.
I published a crate for the macro approach: https://github.com/benfrankel/tiny_bail.
I used (an earlier version of) these macros during the game jam and they were exceptionally convenient.
I feel like this issue drifted from the original purpose and either needs a rename or split to more issues.
My few cents are that we can't take the ergonomic hit from unwrapping everything by hand.
Either we slightly tweak the existing API to make non-panicking methods easier to access, but still keep the panicking ones.
Or we go with the ? approach you guys are discussing, but there seems to be no progress on actually providing that.
The reason I have to write this is, this is causing problems just by existing. Each new API now has to decide whether it should expose panicking methods or not. At the bare minimum this issue needs to resolve how new features should expose their APIs, old or new convention, while the new approach gets developed and applied to previous features. @alice-i-cecile
@MiniaczQ I think Cart's point here is that changing it now is really disruptive to users, and will be really disruptive again once we have better error handling. I personally think this is enough of a footgun / API issue to warrant these consequences, but I see the point.
Or is your suggestion that we simply don't follow the old conventions for new API, but leave the old API intact?
The new error handling sounds like something a working group could develop, if enough people are willing to participate. But I think we have plenty of those currently running already, so I doubt that would be fruitful at this time.
My suggestion is that issues need to stop being marked as X-Controversial because they are in any way involved with this issue.
The controversy is solely here.
The problem is that it's unclear whether new features should expose panicking APIs or not, and that should be decided right now, because not knowing that is halting progress. And if the API doesn't matter, because there will be a massive breaking change anyways, it's just more of an argument to stop referring this issue in other ones.
I've only seen this issue now. My take is Results are the wrong way to go about this. The original problem of "do we panic or not" comes about from error types already (Option). Adding a more granular error type (Result) with more ways to handle them doesn't remedy that it makes it worse.
The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a Res<Foo> doesn't exist the system just doesn't run. Similar to how iterating over a Query<Bar> that matches nothing loops over nothing. If users want a more explicit panicky behavior they can use Option<Res<Foo>> & unwrap.
This also makes for nicer breaking changes because users already have it set up so that systems with Res<Foo> don't panic cause they can't really do anything else with them? The diagnostics from those panics are unreliable at best to begin with.
Additionally this kind of default just makes for better graceful behavior for released apps.
The solution to this I would prefer is to lean into the ECS paradigm more & treat more things as if they were querying the world. If a
Res<Foo>doesn't exist the system just doesn't run. Similar to how iterating over aQuery<Bar>that matches nothing loops over nothing. If users want a more explicit panicky behavior they can useOption<Res<Foo>>& unwrap.
This only resolves errors during resource acquisition, systems can have other types of errors that need to be resolved and it'd be good to have a solution for that too.