WIP: system & frame stepping
Objective
- Add support to interactively step through systems & frames within App
- system step runs the next system in the schedule on the next frame
- frame step runs all remaining systems in the schedule on the next frame
- Used with crates like
bevy-inspector-eguito inspect & modify components between system executions
Solution
- Add support to
SystemExecutors to implement system & frame stepping - Add plumbing necessary to enable stepping for the
SystemExecutorof a specificSchedule- For now, stepping is implemented at a per-schedule granularity
- Add support for critical systems (render/input/ui) to always run, regardless of stepping
Demo
Proof-of-concept implemented using the SingleThreadedExecutor, and added to the breakout example:
https://user-images.githubusercontent.com/857742/224570587-9b9c6107-d4d7-408f-b03e-4058836204b1.mov
(no clue why the video isn't being embedded)
This video shows the system-step & frame-step functionality, and demonstrates a collision bug in the breakout example. At 00:13, we're in stepping mode, and single step as the ball clips the edge of a block, but you can see the ball's direction is not changed.
The following key have been added to the breakout example to demonstrate stepping:
-
Graveenable stepping mode -
Sstep a single system -
Spacestep a full frame
Changelog
The key changes were made to the SystemExecutor, but I've only implemented stepping in the SingleThreadExecutor as a proof-of-concept. The stubs in the other executors are todo!()s.
The rest of the change (to date) is the required plumbing to be able to manipulate stepping from within a system. For the moment this is implemented using ScheduleEvent events, but I'm not wedded to this approach in any way.
There is an entire extra pile of work not yet done in this PR, which is to mark most of bevy's default systems as ignore_stepping(). It wasn't required for the demo, because breakout does everything game-related in the FixedUpdate schedule.
Added
-
SystemExecutor-
SystemExecutor::set_stepping()-- enable/disable stepping -
SystemExecutor::stepping()-- check if stepping is enabled -
SystemExecutor::next_system()-- helper for UI; get index of next system to be run on step -
SystemExecutor::step_system()-- run the next system in the schedule on the next update -
SystemExecutor::step_frame()-- run remaining systems in the schedule on the next update
-
-
enum ScheduleEvent- Added a number of events to enable/disable stepping, or step a specific schedule
- Each event has a Schedule label associated
-
Schedule-
Schedule::handle_event()-- called fromApp::update()to handle ScheduleEvent -
Schedule::next_system()-- return the name of the next system to be run if stepping is enabled -
Schedule::stepping()-- get the stepping status of this schedule
-
Changed
-
SingleThreadedExecutor::run()-- updated to support stepping -
SystemConfig::ignore_stepping()andSystemConfig::ignore_stepping-- allow systems to ignore stepping -
ScheduleGraph::system_ignore_stepping-- Vec, flag which systems are exempt from stepping -
SystemSchedule::systems_with_stepping-- FixedBitSet, which systems stepping applies to -
App-
app.add_event::<ScheduleEvent>()in App::default() -
App.schedule_event_reader--ManualEventReaderforScheduleEvents- Not sure about this one; I was just trying to avoid creating it during every
App::update()call
- Not sure about this one; I was just trying to avoid creating it during every
-
App::update()-- readScheduleEvents, and callhandle_event()on the appropriateSchedulebased on schedule label in the event
-
Migration Guide
- Most rendering or input system should have
ignore_stepping()added to them:
app.add_system(handle_input.ignore_stepping())
.add_system(update_ui.ignore_stepping())
Welcome, new contributor!
Please make sure you've read our contributing guide and we look forward to reviewing your pull request shortly ✨
Does this support time travel? Or I should say, going back into the past?
Does this support time travel? Or I should say, going back into the past?
No, for that you either need to implement the command pattern (for undo), or use some sort of snapshotting of the world. I believe the bevy_save crate offers snapshotting: https://github.com/hankjordan/bevy_save#snapshots-and-rollback
Looking at the changes I made to both single-thread and multi-thread executors, I think the stepping state (which is the same between the two) should be moved into the SystemSchedule. This would also move all the stepping state manipulation code (which is the same between the two, set_stepping(), step_frame(), etc) into there too.
I think this makes logical sense because we’re stepping the schedule, and that state is associated with the schedule, not the executor.
Example breakout failed to run, please try running it locally and check the result.
Example breakout failed to run, please try running it locally and check the result.
Example breakout failed to run, please try running it locally and check the result.
Example alien_cake_addict failed to run, please try running it locally and check the result.
First: this is really cool and something I'd like to include in Bevy. I haven't done a full review yet, but after a first pass my biggest concern is the prevalence of ignore_stepping. This design forces pretty much every system implementer (bevy internals, 3rd party plugins, even user code in some cases) to be aware of stepping and make "the right call" about stepping configuration in order to preserve the integrity of the stepping system. I don't feel comfortable thrusting that concern on every system implementer / I consider solving that problem a hard blocker. We should consider ways to abstract this out wherever possible and solve foundational problems (such as event buffering).
TL;DR: To implement stepping, there must be some complexity added. For the
broadest benefit to users, that complexity should be on the system-implementers
for render, input, and windowing, otherwise stepping won't be widely used. The
complexity of the decision of whether to use ignore_stepping() with a system
comes down to is the application responsive to input, and able to display
frames without that system. That said, there are ways we can make this easier,
and ensure nobody accidentally breaks stepping (see Possible Paths Forward).
my biggest concern is the prevalence of ignore_stepping. This design forces pretty much every system implementer (bevy internals, 3rd party plugins, even user code in some cases) to be aware of stepping and make "the right call" about stepping configuration in order to preserve the integrity of the stepping system.
@cart Some complexity is inherent in the nature of system-based stepping, but I'll argue that it's not as far reaching as you're seeing it right now.
First off, I went nuts with .ignore_stepping() in the PR; probably half of them aren't needed. I marked every system in bevy with it because I wasn't confident stepping would be accepted. I didn't want to dig into every system to see if it was critical to a responsive application while stepping without more guidance.
Critical Systems for a Responsive Application
For stepping to be a usable feature, there exists a subset of systems within bevy that must always run to provide the user a responsive application. If render systems (including UI) don't run, we can't see anything. If input systems don't run, we can't step to the next system, much less exit stepping mode. For a smooth experience, we can probably also throw the window management systems in this group to be safe.
Edit I managed to forget about a bevy pattern where systems run Schedules, such as apply_state_transitions(). These systems must also be .ignore_system.
There must be some way to track which systems are critical to a responsive
application. For this PR, I've implemented it as .ignore_system(), but there
may be alternative approaches for determining this subset.
Outside of those ~~three~~ four groups, everything else should be "safe" to step.
What is "safe"? And making "the right call" for your system
(Note: This section is written assuming that at some point we make the change to bevy where events are buffered until systems consume them. Yes, I know this is cheating, but it's not impossible to implement. Joy has an idea about this; I saw it.)
The vast majority of systems will never have to be considered in the context of stepping. They're game systems that don't handle rendering or input. If you're not pushing pixels, or reading input, most likely stepping will work perfectly without any changes to your system.
For rendering systems, there's no complexity here as render will always ignore stepping. Stepping render systems would require support directly in the render pipeline to do some sort of draw-on-demand, layering each system as you move forward in time. Sure, not all systems are required to draw a functional application, but there's no real benefit to stripping some out (except maybe performance testing, but that should be system enable/disable support).
For input systems, there is a little complexity; Is this system required for keypress/mouse clicks to get to whatever stepping-control system is running. This just ends up in the same category as render; Input will always ignore stepping.
Edit If a system calls World::run_schedule(), or Schedule::run(), it must
also ignore stepping. I think this is the one that makes things complex, but it's a
reflection of the complexity of nesting Schedules within Systems. If we were
able to add a Schedule to another Schedule to be run exclusively, then
this would not be needed. The exclusive system gluing the two together isn't
necessary long-term.
ignore_stepping() Complexity by System-Implementer Category
For the specific system implementer group, the burden added to them can be pinned down.
One key thing to keep in mind while going through these groups. This decision only needs to be made once per system created. It's going to be rare for a system to switch from drawing on the screen to updating entities, so the decision won't need to be re-evaluated each time the system is updated.
Bevy Internals
These implementers will have to suffer the worst of it. Render, UI, input,
window handling, and schedule executing systems must ignore stepping.
But that's all, they just need to remember to put .ignore_stepping() in
those areas. I've already added it to the tuples they use in calls to
.add_systems(), and the individual systems.
The risk here is a new system gets added, in its own .add_system() call, and
doesn't duplicate the .ignore_stepping() all around it.
There are some things we can do to reduce the burden here further:
-
Edit Replace exclusive systems for running schedules with support for
adding a
Scheduleto a parentSchedulefor exclusive execution - Automated test that verifies
.ignore_stepping()is used in these places- It would be complex, but should be possible to implement a stepping test that uses input & render
- Another option is dynamic checking of systems; see Mitigations to Complexity
- Independent
Schedulefor these; schedule marked toignore_stepping- See Mitigations to Complexity below
I think a lot of the complexity concern here is because I threw
ignore_stepping() all over the place in the PR.
Crate Authors
Most crate authors won't have to worry about ignore_stepping(), but it does
put some burden on the authors of render, UI, and input handling crates. The
complexity comes out the the same as for bevy internals. They can probably
get away with just marking all of their systems as ignore_stepping().
As an alternative, there's the independent Schedule approach, which is
probably more ergonomic for crates.
Finally for this category, it would be nice if we had some mechanism for crate authors to verify that their crate doesn't interfere with stepping. That said, I have no solid idea of how to a) implement this, b) share it to the crate community as part of bevy. Absolutely open to suggestions here.
Bevy Users
This category is easier to discuss. As an end-user, if they're not using
stepping, they don't ever need to worry about ignore_stepping().
If they do use stepping, there's two approaches:
-
ignore_stepping()for all render & input systems - Try it, see which system it hangs on when you step, add
ignore_stepping()
The impact of an incorrect choice for ignore_stepping() is simply stepping
doesn't move forward, or the screen freezes.
I feel that the complexity burden with this group balances out as they're the target for this feature.
The Trade-Off
What do we get in exchange for this increased complexity for systems implementers?
Bevy users gain the ability to pause their entire game at any moment in gameplay, and step through each one to diagnose strange behavior. This in combination with an entity inspector & editor gives bevy a built-in runtime debugger.
It is hard to understate how powerful this is. I've used this capability in the past (other platforms) to diagnose why physics was going sideways when I was having trouble diagnosing it via lldb. Being able to see & interact with the application while debugging is ... addictive.
Alternatives
Stepping Opt-In
One alternate approach would be to have bevy users opt their systems into stepping. I don't like this approach for the user-experience. If systems must opt-in to stepping, users will experience two pain points:
- external crates not enabling stepping on the systems they need to step through
- this becomes a maintenance problem for most crates & users
- users will likely want to step through the majority of their systems, not minority
Honestly, if we add barriers to getting stepping (or any debugging tool)
working, it won't be used. "If I just add a few more println!s, I know I can
figure this out!"
External Crate
Right now, there's no way to implement stepping as an external crate. This is
because the SystemExecutor and SystemSchedule are buried quite deeply in
Schedule, and offer no entry-points.
Even with an alternate API, this would end up shifting the burden of
ignore_stepping() only off of Bevy Internal, not off crate makers. It would
likely shift more burden to stepping users to allow/disallow some systems
manually, reducing the adoption of stepping.
Sidebar on what's needed to do this externally
I do see one path to implementing stepping as an external crate, but it requires at least the following changes:
- Per-
Systemenable/disable support inSchedule- This is very easy to implement, and can be based off this PR
- Better visibility into
ScheduleandSystem- Right now it's very difficult to discover every schedule & system that
should be run
- Dynamically generated
Schedules for State change - Schedules that are executed from exclusive systems
- Dynamically generated
- There needs to be some consistent way to iterate all
Schedules
- Right now it's very difficult to discover every schedule & system that
should be run
- Buffered Events
- Because if I'm gonna write a wishlist, why not include all the parts
Dynamically applying ignore_stepping()
There's probably a dynamic way to determine a system should not be stepped.
We do have System::name(), and can easily pattern match bevy::* to ignore
stepping.
This does solve the burden for bevy, but doesn't really help crate authors. Only a small number of crates need to consider stepping though.
Schedule::ignore_stepping()
Keep in mind that stepping is per-Schedule. The example implementation
requires the user to specify which schedules to step. To reduce the burden for
both Bevy Internal and Crate Authors, we can implement
Schedule::ignore_stepping(), and ignore all stepping requests (panic!() or
warn!();return).
This allows a larger granularity for disabling stepping, and simplifies at least the render systems in bevy. I remember seeing talk on the discord about Crate Authors shifting to per-crate schedules to avoid interference from user systems. This may be a good path.
Possible Paths Forward
Ok, after all those words (if you read them all, thank you), here are paths I can see for moving forward:
Option A: Move Forward with Stepping Now
This is of course my preference; I'd like to use this functionality now, and believe the earlier debugging tools are integrated into a system, the better.
I don't feel comfortable thrusting that concern ["the right thing" per-system] on every system implementer
At the highest level the question boils down to: Add ignore_stepping() if
this system is rendering or handling input.
This can be baked into bevy for bevy contributors by either adding "dynamic"
ignore_stepping() based on System::name(), or creating dedicated Schedules
that ignore stepping.
For crate developers, really all we can do to reduce the load is allow them to mark their custom Schedules to ignore stepping. This does require them to switch to custom Schedules, so this may be a complexity wash.
We should consider ways to abstract this out wherever possible and solve foundational problems (such as event buffering).
Agreed, especially event buffering. I will point out that even with the existing limitations regarding events, stepping is useful as it is right now. In my first run of it working in breakout, I found a bug in the collision system. I didn't fix it, but there's a video of the ball destroying a block and not changing course.
The best we can do for abstraction right now is probably the dynamic
application of ignore_stepping() based on bevy crate name. This doesn't
benefit the crate authors, but maybe dedicated Schedules are their solution?
How to make sure stepping isn't broken by contributions
Stepping is at risk of being broken by someone introducing a critical system
for render/input without adding ignore_stepping(). I don't have a good idea
of what's possible in bevy test cases, but it would be great if I could add a
testcase to verify that input handling & rendering are still working with
stepping enabled.
I think just adding this as a CI test would mitigate a lot of concerns from the Bevy Internal side.
Any suggestions on how to implement this would be greatly appreciated. It requires simulating input to be read by the input system, and reading the rendered frame. I have no idea how to do either of those from a testcase.
Option B: Delay Stepping Until Supporting Infrastructure Exists
Delay stepping until the following bits are implemented:
- Buffered Events
- Enable/Disable Systems
- Easier system addressing/labeling
- To a user, the system is a function. To the
Scheduleit's a node with aNodeId. - There is no user-friendly mapping between the two right now
- How do I get from
fn my_system(...)to theNodeIdon theSchedule?
- How do I get from
- To a user, the system is a function. To the
- Easier system addressing/labeling
- Global Visibility to all Schedules
The last two could be easily covered if we got Systems implemented as Entities.
Everything on this list is at least as complex as this PR. I actually went down the rabbit
There could be an intermediate step here with a crate, but long-term I feel bevy benefits greatly from having this functionality built-in.
Option C: Something Else
I'm open to any idea that gets stepping capabilities into Bevy. That's really all I'm looking for here.
After implementing pause functionality in my game, it's struck me that, by and large, I want to group this stepping (and pause) behavior based on the data that the components access.
If we had automatically lazily generated system sets on the basis of access (#7857), I think that maintaining this distinction might be much less onerous. For example, everything that writes to a Window or Input should probably be ignored by default.
Not a complete fix, but perhaps a useful direction. We could also configure a default on a per-schedule basis?
After implementing pause functionality in my game, it's struck me that, by and large, I want to group this stepping (and pause) behavior based on the data that the components access.
This is interesting, but I'm curious if it's just moving the ignore_stepping() metadata from the system to components & resources.
I see two categories here: what system should always be run (!ignore_stepping()), and which subset of steppable systems should be stepped for debugging the current problem. The idea of component based selection for stepping makes a lot of sense for the second group.
BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.
BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.
Yes, that should be possible on the basis of the Access. #5388 by @JoJoJet may give you some helpful clues.
To help illustrate a direction I think we should be headed in, I put together a simple draft PR: #8168.
BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.
Yes, that should be possible on the basis of the
Access. #5388 by @JoJoJet may give you some helpful clues.
@alice-i-cecile I'm not seeing how to make this work. I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:
struct TestEvent;
fn event_system(mut reader: EventReader<TestEvent>) {
for _ in reader.iter() {}
}
#[test]
fn detect_event_system() {
let system = IntoSystem::into_system(event_system);
println!("system.component_access: {:#?}", system.component_access());
println!(
"system.archetype_component_access: {:#?}",
system.archetype_component_access()
);
assert!(false);
}
The output shows empty Access structs for both methods:
system.component_access: Access {
read_and_writes: [],
writes: [],
reads_all: false,
}
system.archetype_component_access: Access {
read_and_writes: [],
writes: [],
reads_all: false,
}
Is there some other mechanism to detect a system takes an EventReader argument?
I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:
let system = IntoSystem::into_system(event_system); println!("system.component_access: {:#?}", system.component_access());
The issue is that you aren't initializing the system. A system's access sets will be empty until you call initialize() on it.
I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:
let system = IntoSystem::into_system(event_system); println!("system.component_access: {:#?}", system.component_access());
The issue is that you aren't initializing the system. A system's access sets will be empty until you call
initialize()on it.
@JoJoJet Thank you! That makes more sense.
Followup on the detecting event reader systems; I got it working, but it requires World. Dropping the code here so I don't lose it:
/// helper function to determine if a system reads events
#[allow(dead_code)]
fn system_reads_events(
system: &dyn System<In = (), Out = ()>,
world: &crate::world::World,
) -> bool {
for id in system.component_access().reads() {
if world
.components()
.get_name(id)
.unwrap()
.starts_with("bevy_ecs::event::Events<")
{
return true;
}
}
false
}
struct TestEvent;
fn read_event_system(mut reader: EventReader<TestEvent>) {
for _ in reader.iter() {}
}
fn write_event_system(mut writer: EventWriter<TestEvent>) {
writer.send(TestEvent);
}
#[test]
fn verify_system_reads_events() {
let mut world = World::new();
let mut reader = IntoSystem::into_system(read_event_system);
reader.initialize(&mut world);
let mut writer = IntoSystem::into_system(write_event_system);
writer.initialize(&mut world);
assert!(system_reads_events(&reader, &world));
assert!(!system_reads_events(&writer, &world));
}
Closing in favor of #8453.