Objective

Add support to interactively step through systems & frames within App
- system step runs the next system in the schedule on the next frame
- frame step runs all remaining systems in the schedule on the next frame
Used with crates like bevy-inspector-egui to inspect & modify components between system executions

Solution

Add support to SystemExecutors to implement system & frame stepping
Add plumbing necessary to enable stepping for the SystemExecutor of a specific Schedule
- For now, stepping is implemented at a per-schedule granularity
Add support for critical systems (render/input/ui) to always run, regardless of stepping

Demo

Proof-of-concept implemented using the SingleThreadedExecutor, and added to the breakout example: https://user-images.githubusercontent.com/857742/224570587-9b9c6107-d4d7-408f-b03e-4058836204b1.mov

(no clue why the video isn't being embedded)

This video shows the system-step & frame-step functionality, and demonstrates a collision bug in the breakout example. At 00:13, we're in stepping mode, and single step as the ball clips the edge of a block, but you can see the ball's direction is not changed.

The following key have been added to the breakout example to demonstrate stepping:

Grave enable stepping mode
S step a single system
Space step a full frame

Changelog

The key changes were made to the SystemExecutor, but I've only implemented stepping in the SingleThreadExecutor as a proof-of-concept. The stubs in the other executors are todo!()s.

The rest of the change (to date) is the required plumbing to be able to manipulate stepping from within a system. For the moment this is implemented using ScheduleEvent events, but I'm not wedded to this approach in any way.

There is an entire extra pile of work not yet done in this PR, which is to mark most of bevy's default systems as ignore_stepping(). It wasn't required for the demo, because breakout does everything game-related in the FixedUpdate schedule.

Added

SystemExecutor
- SystemExecutor::set_stepping() -- enable/disable stepping
- SystemExecutor::stepping() -- check if stepping is enabled
- SystemExecutor::next_system() -- helper for UI; get index of next system to be run on step
- SystemExecutor::step_system() -- run the next system in the schedule on the next update
- SystemExecutor::step_frame() -- run remaining systems in the schedule on the next update
enum ScheduleEvent
- Added a number of events to enable/disable stepping, or step a specific schedule
- Each event has a Schedule label associated
Schedule
- Schedule::handle_event() -- called from App::update() to handle ScheduleEvent
- Schedule::next_system() -- return the name of the next system to be run if stepping is enabled
- Schedule::stepping() -- get the stepping status of this schedule

Changed

SingleThreadedExecutor::run() -- updated to support stepping
SystemConfig::ignore_stepping() and SystemConfig::ignore_stepping -- allow systems to ignore stepping
ScheduleGraph::system_ignore_stepping -- Vec, flag which systems are exempt from stepping
SystemSchedule::systems_with_stepping -- FixedBitSet, which systems stepping applies to
App
- app.add_event::<ScheduleEvent>() in App::default()
- App.schedule_event_reader -- ManualEventReader for ScheduleEvents
  - Not sure about this one; I was just trying to avoid creating it during every App::update() call
- App::update() -- read ScheduleEvents, and call handle_event() on the appropriate Schedule based on schedule label in the event

Migration Guide

Most rendering or input system should have ignore_stepping() added to them:

app.add_system(handle_input.ignore_stepping())
    .add_system(update_ui.ignore_stepping())

Mar 12 '23 20:03 dmlary

Welcome, new contributor!

Please make sure you've read our contributing guide and we look forward to reviewing your pull request shortly ✨

Mar 12 '23 20:03 github-actions[bot]

Does this support time travel? Or I should say, going back into the past?

Mar 13 '23 04:03 tigregalis

Does this support time travel? Or I should say, going back into the past?

No, for that you either need to implement the command pattern (for undo), or use some sort of snapshotting of the world. I believe the bevy_save crate offers snapshotting: https://github.com/hankjordan/bevy_save#snapshots-and-rollback

Mar 13 '23 05:03 dmlary

Looking at the changes I made to both single-thread and multi-thread executors, I think the stepping state (which is the same between the two) should be moved into the SystemSchedule. This would also move all the stepping state manipulation code (which is the same between the two, set_stepping(), step_frame(), etc) into there too.

I think this makes logical sense because we’re stepping the schedule, and that state is associated with the schedule, not the executor.

Mar 13 '23 12:03 dmlary

Example breakout failed to run, please try running it locally and check the result.

Mar 15 '23 03:03 github-actions[bot]

Example breakout failed to run, please try running it locally and check the result.

Mar 15 '23 04:03 github-actions[bot]

Example breakout failed to run, please try running it locally and check the result.

Mar 18 '23 04:03 github-actions[bot]

Example alien_cake_addict failed to run, please try running it locally and check the result.

Mar 19 '23 04:03 github-actions[bot]

First: this is really cool and something I'd like to include in Bevy. I haven't done a full review yet, but after a first pass my biggest concern is the prevalence of ignore_stepping. This design forces pretty much every system implementer (bevy internals, 3rd party plugins, even user code in some cases) to be aware of stepping and make "the right call" about stepping configuration in order to preserve the integrity of the stepping system. I don't feel comfortable thrusting that concern on every system implementer / I consider solving that problem a hard blocker. We should consider ways to abstract this out wherever possible and solve foundational problems (such as event buffering).

Mar 21 '23 20:03 cart

TL;DR: To implement stepping, there must be some complexity added. For the broadest benefit to users, that complexity should be on the system-implementers for render, input, and windowing, otherwise stepping won't be widely used. The complexity of the decision of whether to use ignore_stepping() with a system comes down to is the application responsive to input, and able to display frames without that system. That said, there are ways we can make this easier, and ensure nobody accidentally breaks stepping (see Possible Paths Forward).

my biggest concern is the prevalence of ignore_stepping. This design forces pretty much every system implementer (bevy internals, 3rd party plugins, even user code in some cases) to be aware of stepping and make "the right call" about stepping configuration in order to preserve the integrity of the stepping system.

@cart Some complexity is inherent in the nature of system-based stepping, but I'll argue that it's not as far reaching as you're seeing it right now.

First off, I went nuts with .ignore_stepping() in the PR; probably half of them aren't needed. I marked every system in bevy with it because I wasn't confident stepping would be accepted. I didn't want to dig into every system to see if it was critical to a responsive application while stepping without more guidance.

Critical Systems for a Responsive Application

For stepping to be a usable feature, there exists a subset of systems within bevy that must always run to provide the user a responsive application. If render systems (including UI) don't run, we can't see anything. If input systems don't run, we can't step to the next system, much less exit stepping mode. For a smooth experience, we can probably also throw the window management systems in this group to be safe.

Edit I managed to forget about a bevy pattern where systems run Schedules, such as apply_state_transitions(). These systems must also be .ignore_system.

There must be some way to track which systems are critical to a responsive application. For this PR, I've implemented it as .ignore_system(), but there may be alternative approaches for determining this subset.

Outside of those ~~three~~ four groups, everything else should be "safe" to step.

What is "safe"? And making "the right call" for your system

(Note: This section is written assuming that at some point we make the change to bevy where events are buffered until systems consume them. Yes, I know this is cheating, but it's not impossible to implement. Joy has an idea about this; I saw it.)

The vast majority of systems will never have to be considered in the context of stepping. They're game systems that don't handle rendering or input. If you're not pushing pixels, or reading input, most likely stepping will work perfectly without any changes to your system.

For rendering systems, there's no complexity here as render will always ignore stepping. Stepping render systems would require support directly in the render pipeline to do some sort of draw-on-demand, layering each system as you move forward in time. Sure, not all systems are required to draw a functional application, but there's no real benefit to stripping some out (except maybe performance testing, but that should be system enable/disable support).

For input systems, there is a little complexity; Is this system required for keypress/mouse clicks to get to whatever stepping-control system is running. This just ends up in the same category as render; Input will always ignore stepping.

Edit If a system calls World::run_schedule(), or Schedule::run(), it must also ignore stepping. I think this is the one that makes things complex, but it's a reflection of the complexity of nesting Schedules within Systems. If we were able to add a Schedule to another Schedule to be run exclusively, then this would not be needed. The exclusive system gluing the two together isn't necessary long-term.

`ignore_stepping()` Complexity by System-Implementer Category

For the specific system implementer group, the burden added to them can be pinned down.

One key thing to keep in mind while going through these groups. This decision only needs to be made once per system created. It's going to be rare for a system to switch from drawing on the screen to updating entities, so the decision won't need to be re-evaluated each time the system is updated.

Bevy Internals

These implementers will have to suffer the worst of it. Render, UI, input, window handling, and schedule executing systems must ignore stepping. But that's all, they just need to remember to put .ignore_stepping() in those areas. I've already added it to the tuples they use in calls to .add_systems(), and the individual systems.

The risk here is a new system gets added, in its own .add_system() call, and doesn't duplicate the .ignore_stepping() all around it.

There are some things we can do to reduce the burden here further:

Edit Replace exclusive systems for running schedules with support for adding a Schedule to a parent Schedule for exclusive execution
Automated test that verifies .ignore_stepping() is used in these places
- It would be complex, but should be possible to implement a stepping test that uses input & render
- Another option is dynamic checking of systems; see Mitigations to Complexity
Independent Schedule for these; schedule marked to ignore_stepping
- See Mitigations to Complexity below

I think a lot of the complexity concern here is because I threw ignore_stepping() all over the place in the PR.

Crate Authors

Most crate authors won't have to worry about ignore_stepping(), but it does put some burden on the authors of render, UI, and input handling crates. The complexity comes out the the same as for bevy internals. They can probably get away with just marking all of their systems as ignore_stepping().

As an alternative, there's the independent Schedule approach, which is probably more ergonomic for crates.

Finally for this category, it would be nice if we had some mechanism for crate authors to verify that their crate doesn't interfere with stepping. That said, I have no solid idea of how to a) implement this, b) share it to the crate community as part of bevy. Absolutely open to suggestions here.

Bevy Users

This category is easier to discuss. As an end-user, if they're not using stepping, they don't ever need to worry about ignore_stepping().

If they do use stepping, there's two approaches:

ignore_stepping() for all render & input systems
Try it, see which system it hangs on when you step, add ignore_stepping()

The impact of an incorrect choice for ignore_stepping() is simply stepping doesn't move forward, or the screen freezes.

I feel that the complexity burden with this group balances out as they're the target for this feature.

The Trade-Off

What do we get in exchange for this increased complexity for systems implementers?

Bevy users gain the ability to pause their entire game at any moment in gameplay, and step through each one to diagnose strange behavior. This in combination with an entity inspector & editor gives bevy a built-in runtime debugger.

It is hard to understate how powerful this is. I've used this capability in the past (other platforms) to diagnose why physics was going sideways when I was having trouble diagnosing it via lldb. Being able to see & interact with the application while debugging is ... addictive.

Alternatives

Stepping Opt-In

One alternate approach would be to have bevy users opt their systems into stepping. I don't like this approach for the user-experience. If systems must opt-in to stepping, users will experience two pain points:

external crates not enabling stepping on the systems they need to step through
- this becomes a maintenance problem for most crates & users
users will likely want to step through the majority of their systems, not minority

Honestly, if we add barriers to getting stepping (or any debugging tool) working, it won't be used. "If I just add a few more println!s, I know I can figure this out!"

External Crate

Right now, there's no way to implement stepping as an external crate. This is because the SystemExecutor and SystemSchedule are buried quite deeply in Schedule, and offer no entry-points.

Even with an alternate API, this would end up shifting the burden of ignore_stepping() only off of Bevy Internal, not off crate makers. It would likely shift more burden to stepping users to allow/disallow some systems manually, reducing the adoption of stepping.

Sidebar on what's needed to do this externally

I do see one path to implementing stepping as an external crate, but it requires at least the following changes:

Per-System enable/disable support in Schedule
- This is very easy to implement, and can be based off this PR
Better visibility into Schedule and System
- Right now it's very difficult to discover every schedule & system that should be run
  - Dynamically generated Schedules for State change
  - Schedules that are executed from exclusive systems
- There needs to be some consistent way to iterate all Schedules
Buffered Events
- Because if I'm gonna write a wishlist, why not include all the parts

Dynamically applying `ignore_stepping()`

There's probably a dynamic way to determine a system should not be stepped. We do have System::name(), and can easily pattern match bevy::* to ignore stepping.

This does solve the burden for bevy, but doesn't really help crate authors. Only a small number of crates need to consider stepping though.

`Schedule::ignore_stepping()`

Keep in mind that stepping is per-Schedule. The example implementation requires the user to specify which schedules to step. To reduce the burden for both Bevy Internal and Crate Authors, we can implement Schedule::ignore_stepping(), and ignore all stepping requests (panic!() or warn!();return).

This allows a larger granularity for disabling stepping, and simplifies at least the render systems in bevy. I remember seeing talk on the discord about Crate Authors shifting to per-crate schedules to avoid interference from user systems. This may be a good path.

Possible Paths Forward

Ok, after all those words (if you read them all, thank you), here are paths I can see for moving forward:

Option A: Move Forward with Stepping Now

This is of course my preference; I'd like to use this functionality now, and believe the earlier debugging tools are integrated into a system, the better.

I don't feel comfortable thrusting that concern ["the right thing" per-system] on every system implementer

At the highest level the question boils down to: Add ignore_stepping() if this system is rendering or handling input.

This can be baked into bevy for bevy contributors by either adding "dynamic" ignore_stepping() based on System::name(), or creating dedicated Schedules that ignore stepping.

For crate developers, really all we can do to reduce the load is allow them to mark their custom Schedules to ignore stepping. This does require them to switch to custom Schedules, so this may be a complexity wash.

We should consider ways to abstract this out wherever possible and solve foundational problems (such as event buffering).

Agreed, especially event buffering. I will point out that even with the existing limitations regarding events, stepping is useful as it is right now. In my first run of it working in breakout, I found a bug in the collision system. I didn't fix it, but there's a video of the ball destroying a block and not changing course.

The best we can do for abstraction right now is probably the dynamic application of ignore_stepping() based on bevy crate name. This doesn't benefit the crate authors, but maybe dedicated Schedules are their solution?

How to make sure stepping isn't broken by contributions

Stepping is at risk of being broken by someone introducing a critical system for render/input without adding ignore_stepping(). I don't have a good idea of what's possible in bevy test cases, but it would be great if I could add a testcase to verify that input handling & rendering are still working with stepping enabled.

I think just adding this as a CI test would mitigate a lot of concerns from the Bevy Internal side.

Any suggestions on how to implement this would be greatly appreciated. It requires simulating input to be read by the input system, and reading the rendered frame. I have no idea how to do either of those from a testcase.

Option B: Delay Stepping Until Supporting Infrastructure Exists

Delay stepping until the following bits are implemented:

Buffered Events
Enable/Disable Systems
- Easier system addressing/labeling
  - To a user, the system is a function. To the Schedule it's a node with a NodeId.
  - There is no user-friendly mapping between the two right now
    - How do I get from fn my_system(...) to the NodeId on the Schedule?
Global Visibility to all Schedules

The last two could be easily covered if we got Systems implemented as Entities.

Everything on this list is at least as complex as this PR. I actually went down the rabbit

There could be an intermediate step here with a crate, but long-term I feel bevy benefits greatly from having this functionality built-in.

Option C: Something Else

I'm open to any idea that gets stepping capabilities into Bevy. That's really all I'm looking for here.

Mar 22 '23 03:03 dmlary

After implementing pause functionality in my game, it's struck me that, by and large, I want to group this stepping (and pause) behavior based on the data that the components access.

If we had automatically lazily generated system sets on the basis of access (#7857), I think that maintaining this distinction might be much less onerous. For example, everything that writes to a Window or Input should probably be ignored by default.

Not a complete fix, but perhaps a useful direction. We could also configure a default on a per-schedule basis?

Mar 22 '23 14:03 alice-i-cecile

After implementing pause functionality in my game, it's struck me that, by and large, I want to group this stepping (and pause) behavior based on the data that the components access.

This is interesting, but I'm curious if it's just moving the ignore_stepping() metadata from the system to components & resources.

I see two categories here: what system should always be run (!ignore_stepping()), and which subset of steppable systems should be stepped for debugging the current problem. The idea of component based selection for stepping makes a lot of sense for the second group.

BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.

Mar 22 '23 14:03 dmlary

BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.

Yes, that should be possible on the basis of the Access. #5388 by @JoJoJet may give you some helpful clues.

Mar 22 '23 15:03 alice-i-cecile

To help illustrate a direction I think we should be headed in, I put together a simple draft PR: #8168.

Mar 22 '23 21:03 cart

BUT! Oh, does this mean there's a way to tell from the System object that a system reads some events? If so, we could automatically handle the ignore_stepping() for evented systems right now.

Yes, that should be possible on the basis of the Access. #5388 by @JoJoJet may give you some helpful clues.

@alice-i-cecile I'm not seeing how to make this work. I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:

    struct TestEvent;
    fn event_system(mut reader: EventReader<TestEvent>) {
        for _ in reader.iter() {}
    }

    #[test]
    fn detect_event_system() {
        let system = IntoSystem::into_system(event_system);
        println!("system.component_access: {:#?}", system.component_access());
        println!(
            "system.archetype_component_access: {:#?}",
            system.archetype_component_access()
        );

        assert!(false);
    }

The output shows empty Access structs for both methods:

system.component_access: Access {
    read_and_writes: [],
    writes: [],
    reads_all: false,
}
system.archetype_component_access: Access {
    read_and_writes: [],
    writes: [],
    reads_all: false,
}

Is there some other mechanism to detect a system takes an EventReader argument?

Apr 01 '23 17:04 dmlary

I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:

let system = IntoSystem::into_system(event_system); println!("system.component_access: {:#?}", system.component_access());

The issue is that you aren't initializing the system. A system's access sets will be empty until you call initialize() on it.

Apr 02 '23 12:04 joseph-gio

I've set up the following test to print the Access values for an event-based system, and both Access methods contain nothing:

let system = IntoSystem::into_system(event_system); println!("system.component_access: {:#?}", system.component_access());

The issue is that you aren't initializing the system. A system's access sets will be empty until you call initialize() on it.

@JoJoJet Thank you! That makes more sense.

Apr 02 '23 14:04 dmlary

Followup on the detecting event reader systems; I got it working, but it requires World. Dropping the code here so I don't lose it:

/// helper function to determine if a system reads events
#[allow(dead_code)]
fn system_reads_events(
    system: &dyn System<In = (), Out = ()>,
    world: &crate::world::World,
) -> bool {
    for id in system.component_access().reads() {
        if world
            .components()
            .get_name(id)
            .unwrap()
            .starts_with("bevy_ecs::event::Events<")
        {
            return true;
        }
    }
    false
}

struct TestEvent;
fn read_event_system(mut reader: EventReader<TestEvent>) {
    for _ in reader.iter() {}
}

fn write_event_system(mut writer: EventWriter<TestEvent>) {
    writer.send(TestEvent);
}

#[test]
fn verify_system_reads_events() {
    let mut world = World::new();
    let mut reader = IntoSystem::into_system(read_event_system);
    reader.initialize(&mut world);
    let mut writer = IntoSystem::into_system(write_event_system);
    writer.initialize(&mut world);
    assert!(system_reads_events(&reader, &world));
    assert!(!system_reads_events(&writer, &world));
}

Apr 21 '23 14:04 dmlary

Closing in favor of #8453.

Apr 22 '23 20:04 alice-i-cecile

WIP: system & frame stepping

Objective

Solution

Demo

Changelog

Added

Changed

Migration Guide

Critical Systems for a Responsive Application

What is "safe"? And making "the right call" for your system

ignore_stepping() Complexity by System-Implementer Category

Bevy Internals

Crate Authors

Bevy Users

The Trade-Off

Alternatives

Stepping Opt-In

External Crate

Sidebar on what's needed to do this externally

Dynamically applying ignore_stepping()

Schedule::ignore_stepping()

Possible Paths Forward

Option A: Move Forward with Stepping Now

How to make sure stepping isn't broken by contributions

Option B: Delay Stepping Until Supporting Infrastructure Exists

Option C: Something Else

`ignore_stepping()` Complexity by System-Implementer Category

Dynamically applying `ignore_stepping()`

`Schedule::ignore_stepping()`