rfcs
rfcs copied to clipboard
Stageless: a complete scheduling overhaul
This RFC is now ready for final review. Feedback is welcome from users at all levels! Do you think this design easy-to-understand, useful and correct? How could the proposed API be improved?
TL; DR
- Systems: now stored in the
World
- Exclusive systems: no longer special-cased
- System sets: now just pure metadata, define order and conditions of groups of systems
- Schedules: now the executable instances of system sets, also stored in the
World
- Run criteria: replaced by combineable, non-mutating,
bool
-returning functions - States: now powered by new exclusive system + schedule combo instead of run criteria
- Fixed timestep: now powered by new exclusive system + schedule combo instead of run criteria
- Commands: now applied with an exclusive system
- Stages: yeet
- All of the awful bugs with stages, states, and run criteria: replaced with new, yet-to-be-discovered bugs!
Plus, some shiny new toys: running systems in commands and modifying schedules at runtime!
Prototype
iyes_loopless is a Bevy 0.6-compatible third-party implementation of the core ideas of how this RFC handles run criteria and states. Go try it out!
Context
This is the culmination of many months of discussion, work, debate and mind-melting from @maniwani (Joy), @hymm (WrongShoe) and myself; they should be considered equal coauthors on this RFC. If you're curious about the background, check out https://github.com/bevyengine/bevy/discussions/2801 to get a sense of the tangled mess we're trying to clean up.
Special thanks to @Ratysz and the rest of the ECS crew for all of the feedback and guidance throughout the process.
Meta
What is the best way to handle the migration process from an organizational perspective?
- Get an RFC approved, and then merge a massive PR developed on a off-org branch?
- Straightforward, but very hard to review
- Risks divergence and merge conflicts
- Use an official branch with delegated authority?
- Easier to review, requires delegation, ultimately creates a large PR that needs to be reviewed
- Risks divergence and merge conflicts
- Develop
bevy_schedule3
on themain
branch as a partial fork ofbevy_ecs
?- Some annoying machinery to set up
- Requires more delegation and trust to ECS team
- Avoids divergence and merge conflicts
- Clutters the main branch
I worry that this may end up clunky and frustrating to use without automatic management/insertion of the commands and states flushing systems. I'd love to be proven wrong.
I worry that this may end up clunky and frustrating to use without automatic management/insertion of the commands and states flushing systems.
In all honesty, I agree. I wanted to start with the more straightforward, explicit design for the first draft. I suspect that automatic inference will both be less frustrating and faster for end users though.
I worry that this may end up clunky and frustrating to use without automatic management/insertion of the flushing systems.
Frustration over stages sacrificing potential concurrency was a big motivator for this effort. IMO explicit flush systems are hands down better than implicit stage flushes.
To clarify, this isn't "manual insertion forever." Automatic insertion can build on top of this. It's just going to require more system annotation (or type system magic), so the schedule builder can know which components any system can affect with commands.
Maybe every system that is added could have a IdLabel(SystemId)
added to it. It might simplify the implementation since only ordering between labels would be required.
I'm very excited by this proposal. It both gives us finer grained control over the scheduler and simplify the scheduler and API at the same time.
I think this is simpler. It makes the pitfalls explicit and forces documenting them (as with the flush_commands
system).
Hmm, I personally don't like the use of the word criterion as singular of criteria. I think here we can find good reasons to use criteria as singular.
A note about before
and after
. In order to make this work as expected (a
before b
before c
would imply a
before c
), it is not sufficient to simply add a dependency between a
and b
and one between b
and c
. If a
accesses X, b
accesses Y and c
accesses X, then a
will never conflict with b
and b
will never conflict with c
, but a
conflicts with c
. So a transitive closure would need to be constructed over the after
and before
relations.
Another thing to consider is whether the dependencies should be constructed when the systems could conflict or when they do conflict. i.e, should the FilteredAccessSet<ComponentId>
or Access<ArchetypeComponentId>
be compared. If Access
is compared then the schedule would need to be rebuilt every time archetypes change.
Another thing to consider is whether the dependencies should be constructed when the systems could conflict or when they do conflict. i.e, should the
FilteredAccessSet<ComponentId>
orAccess<ArchetypeComponentId>
be compared. IfAccess
is compared then the schedule would need to be rebuilt every time archetypes change.
@bilsen Schedule construction does not consider this. It produces a top-sorted list of systems from the DAG we get from the label ordering. ComponentId
access is only checked when reporting ambiguities (archetypes don't exist yet). However, the executor does look at ArchetypeComponentId
access when determining if another system task can be started.
Schedule construction does not consider this. It produces a top-sorted list of systems from the DAG we get from the label ordering. ComponentId access is only checked when reporting ambiguities (archetypes don't exist yet). However, the executor does look at ArchetypeComponentId access when determining if another system task can be started.
@maniwani Then I'm not sure I understand how before
should work in the proposal. How is it decided when there is a dependency between two systems without looking at their accesses? The validity of the sorted list will depend on whether systems can/will conflict with each other and also have a before
dependency.
My understanding is that the list will be sorted based on the before
unconditionally, but there's no reason the executor can't 'disregard' this sorting, since running the systems in the other order would be valid. That is, before
only encodes the relative order in the 'queue', and the ArchetypeComponent
level collisions encode whether that ordering matters at the time of execution.
My understanding is that the list will be sorted based on the before unconditionally, but there's no reason the executor can't 'disregard' this sorting, since running the systems in the other order would be valid. That is, before only encodes the relative order in the 'queue', and the ArchetypeComponent level collisions encode whether that ordering matters at the time of execution.
Exactly this.
The system graph is completely determined by the order of labels (edit: well, ordering targets now). If system a
is marked as before
system b
, we add a
to b
's dependencies (honestly, "dependency graph" is a bit of a misnomer). We do warn users about pairs of systems that 1) are in the same level in the graph and 2) write the same ComponentId
, but every topological order is valid.
When the executor is signaling spawned tasks to start (edit: which are spawned in the sorted order), it checks their "dependencies" and their ArchetypeComponentId
access. If all a system's dependencies have completed and its access doesn't conflict with another system task currently running, it can start.
I'm also not 100% clear on the if-needed semantics, but I assume we'd discard the if-needed dependencies once the list is sorted, and just preserve the strict and atomic ones.
I'm also not 100% clear on the if-needed semantics, but I assume we'd discard the if-needed dependencies once the list is sorted, and just preserve the strict and atomic ones.
Precisely. The if-needed dependencies get converted to strict or discarded.
On Subgraphs
We should be thinking more explicitly about how we create subgraphs. In the current proposal there are three different ways that we define subgraphs. We have: Running schedules from a exclusive system; Giving several systems the same label; Putting system into a system set. Schedules are nicely contained since you can't share labels or system sets between schedules. But labeling and systems sets interact and can do so in ways that are hard to reason about.
The issue is that both methods create subgraphs and these subgraphs can overlap with other subgraphs. Consider two subgraphs that would each represent a unit of work, graph A and graph B:
Now A3 and B2 might depend on the same data from system C1 so the user needs to add a dependency to these systems:
This kind of labeling creates 3 sub graphs that are intertwined and no longer independent. This could easily lead to spaghetti code since there is no defined hierarchy between these three subgraphs.
Instead this type of relationship should be expressed as the relationship of C1 to the 2 subgraphs rather than to an individual system. This leaves the subgraphs as opaque units to the higher level graph.
Being able to create connections across vastly different parts of the system graph is actually too powerful and breaks encapsulation of subgraphs defined by users or plugins. It's important that the abstraction for subgraphs isolates the systems inside it's graph from effects of systems outside it's graph. This will lead to well defined and understandable behavior.
Besides organization there are other advantages to encapsulating subgraphs:
- Keeps the number of dependencies on individual systems down. This may expecially be important as the graphs grow more complicated. Extra dependencies while light have a non zero effect on performance.
- Makes it easier to check ambiguities. The ambiguity detector should only check against the full subgraphs data.
- a natural unit on which to apply atomic subgraph behavior.
Schedules as the main way to make subgraphs
I would propose that we merge system sets and schedules into one abstraction and only allow labeling between systems inside the subgraph. Labeling is not allowed between a system inside the subgraph to a system outside of it. This will give schedules the properties argued for above for encapsulation.
To make working with smaller schedules less of a performance footgun, it is important that they are not run from an exclusive system. They should instead be run from an asynchronous system. This will allow the parallel systems of schedules run at the same time to be run in parallel. If there are exclusive systems in schedules running at the same time they will be grouped to be run in a sequence during the exclusive phase of the schedule.
Besides parallelism, this also:
- Allows ordering around the async system that runs the schedule with normal system labeling.
- Allows an easy place to stick atomic subgraph constraints as they can be captured on running the schedule and released when the schedule is finished.
- For convenience we can implement async system for a schedule or create a wrapper, so it is easy to insert into a graph and can be ordered around
Seems like there's already bunch of feedback from contributors to the bevy project. As a user of bevy, this proposal sounds amazing ,and I hope I'll eventually be able to rewrite my game in this proposed style.
I am a bit confused about the system label and system set merge.
The idea is there are systems and there are groups of systems. I started calling groups of systems "sets" and they're pretty much gonna be implemented as labels with configs (see bevyengine/bevy#4391), meaning you will be able to add sets to an App
the same way you add systems (and before saying which systems belong to them).
I also want to point out that assigning a system to a set/group (i.e. labeling it) is different from giving it a unique name. Those will be separate methods.
Saying a group of systems has the same label better describes what we are intending with the new design and makes it easier to picture the group intertwined with other labels, in my opinion.
I think the "set" framing will be significantly more intuitive to users. A label is an abstract thing that has no associated visual (at least not one that matters here). Calling the "group of systems" a "set" is more accurate to how the App
thinks about graphs.
As far as the App
is concerned, when you label a system, you are adding that system to a group, and that entire group can be ordered with respect to other stuff. The group is itself a node in some dependency graph if you "zoom out".
If you give a system two labels, A
and B
, you can't have A
before B
or B
before A
. I think this is much easier to explain with the set framing, e.g. "If a system belongs to multiple sets, those sets overlap/intersect, so you can't have edges between them."
For example, my expectation of putting a run criteria on a set would have been for the criteria to run once and decide if the whole set should be skipped or not.
This is indeed how it will work.
If I think about a system being part of multiple sets, I would expect it to run multiple times (once per set).
This isn't the case, but I think your intuition here comes from how the SystemSet
container behaves.
merge SystemSetConfig and SystemConfig (currently a set config only stores a system config anyways)
IIUC, I've merged these as much as I could in bevyengine/bevy#4391.
If we stick with sets, I would like to rename .to() to .in().
So would I, but in
is a reserved keyword, so we can't unfortunately. What about .in_set()
, .to_set()
, or .under()
?
I am a bit confused about the system label and system set merge.
The idea is there are systems and there are groups of systems. I started calling groups of systems "sets" and they're pretty much gonna be implemented as labels with configs (see bevyengine/bevy#4391), meaning you will be able to add sets to an
App
the same way you add systems (and before saying which systems belong to them).
Why not go with "group" then? I agree that I might be a bit biased here because I keep associating sets with SystemSet
, but for me set also has more meaning than just a group of something (e.g. uniqueness, which doesn't fit here at all).
I also want to point out that assigning a system to a set/group (i.e. labeling it) is different from giving it a unique name. Those will be separate methods.
Could I not just make a set with only one system in it? I think I would prefer that over more API.
For example, my expectation of putting a run criteria on a set would have been for the criteria to run once and decide if the whole set should be skipped or not.
This is indeed how it will work.
The RFC currently states the opposite:
if a run criterion is attached to a system set, a run-criterion-system will be generated for each contained system
this is essential to ensure that run criteria are checking fresh state without creating very difficult-to-satisfy ordering constraints if you need to ensure that all systems behave the same way during a single pass of the schedule or avoid expensive recomputation, precompute the value and store the result in a resource, then read from that in your run criteria instead
So would I, but in is a reserved keyword, so we can't unfortunately. What about .in_set(), .to_set(), or .under()?
I would prefer in_set()
then :+1:
I have a question regarding the run criteria example:
.add_system((
tick_construction_timer,
update_construction_progress)
.chain()
.run_if(construction_timer_finished)
)
Doesn't this mean "the timer is only ticked and the construction progress is only updated if the timer has finished"? Then I would follow: The timer is not finished -> The timer will not tick -> The timer will never finish -> The construction progress will never be updated.
So to me it would make more sense if the run criteria also handles the timer update.
However, I'm new to Bevy, so maybe I misunderstood the concepts :)
@TimJentzsch you're totally right. Good eye; I just pushed a fix for that.
@TimJentzsch you're totally right. Good eye; I just pushed a fix for that.
Nice!
I think the comment is outdated now though:
// This function can be used as a run criterion,
// because it does not mutate data and returns `bool`
The comment also makes it sound like mutating data inside runtime criteria is not allowed, does the example still work now?
The comment also makes it sound like mutating data inside runtime criteria is not allowed, does the example still work now?
Oh right 🤔 Yeah, this wouldn't compile. Let me try again!
@alice-i-cecile Would it maybe work if we keep the separate timer update system, but change the app logic as follows?
.add_system(update_construction_timer)
.add_system(
update_construction_progress
.after(update_construction_timer)
.run_if(construction_timer_finished)
)
I think the timer would then update on every render, then the run criteria would check if the timer has finished and then execute the update if applicable.
I don't like that the resource for controlling state transitions is called Fsm
.
It makes Bevy harder to learn, as it makes beginners deal with technical jargon they might not be familiar with, while learning a very important piece of functionality.
For experienced users, abbreviated/acronym names like that are not easily searchable/findable in docs and feel jarringly different from the clean and readable APIs we have elsewhere.
I propose calling it something like StateControl
. Has "state" in the name, easy to find, easy to understand, and no overlap with the State
trait.
I don't like that the resource for controlling state transitions is called Fsm. I propose calling it something like StateControl. Has "state" in the name, easy to find, easy to understand, and no overlap with the State trait.
@inodentry I think you did even better in your awesome iyes_loopless
by representing states using 2 separate resources: CurrentState(T)
and NextState(T)
.
I like your approach more because the current state is queried rarely. Most of the time it used when registering systems. But state transitioning used often and the mentioned separation allows to avoid querying the current state: you just insert NextState(T)
using Commands
. This reduces number of arguments in systems and should be a little better for performance.
@Nilirad
Thanks for taking the time to review and write all this feedback.
The implementation strategy and the appendix seems to focus much on the language features but leaving out the bigger picture. I also don't think that just a big nested list is the best thing for readability and understanding.
Can you clarify what you mean by "bigger picture"?
The big picture is how the design reduces the number of things you can't do (resolves the issues listed in the motivation section).
I wrote the impl list because cart asked us to "break [this design] up into conceptual pieces to be merged in a particular order, with a bias toward the smallest changes possible and a desire to merge small pieces into main as early as possible."
The relationship between a system set and a schedule is unclear sometimes. Looks like there is only a single schedule in the world, but when you checkout a system set from the Systems resource, there is a 1:1 correspondency.
Can you describe what was confusing? Are you asking like, "Why have two names if they're the 'same' thing?"
It's written that a schedule is the executable form of a system set, so they do have a 1:1 relationship. A schedule is a container that can hold and properly run the systems and conditions of a system set you described using the app builder API.
Idk how you got "single schedule in the world" though. Most systems might fall under the app update schedule, but there's no schedule limit. The doc does mention other schedules an ordinary app might have (state transitions, fixed timestep, etc.)
May a graph-based API solve at the root the problems of graph unsolvability and order ambiguities?
I'm not sure what you're asking. We can only try to build from what we're given. Errors and ambiguities exist because users can give invalid and incomplete properties. I don't believe there's an API that can avoid that.
How are conditions applied once the schedule gets flattened? If I understood correctly, information about sets gets lost. So it looks to me that a condition that guards a set is cloned and applied to all the systems under its hierarchy.
The step-by-step process of how conditions would be evaluated is written in the appendix. You can also review the code from the linked prototype.
I think you misinterpreted flattening as permanently throwing information away. We wouldn't do that. We can take a dependency graph and just make another "flattened copy" of it.
Likewise, a condition attached to a system set wouldn't be duplicated to all the systems underneath. It's just the one instance you created. The executor will know to evaluate it before the relevant systems run and skip them all if it returns false
. Duplicating it actually wouldn't behave correctly.
That said, I don't think it's necessary for this document to explain minute impl details like that or how properties are transformed into graphs, checked for errors, and then used to produce schedules you can run.
I'm happy to improve clarity, but imo this RFC is ready for cart's review.
@maniwani
Idk how you got "single schedule in the world" though.
Hmm, probably residual knowledge from old versions of this RFC. I remember once finding some contradictions about the number of schedules and that may have reinforced the assumption to the point I didn't understand correctly the top definition.
Can you clarify what you mean by "bigger picture"?
By bigger picture I meant a higher level view of the implementation structure. That would primarily include a cohesive overview of what happens under the hood on the scheduling front after the user defines the App
configuration.
I think you misinterpreted flattening as permanently throwing information away.
Yep, I definitely thought that...
imo this RFC is ready for cart's review.
At this point I agree, the design is final and he definitely won't have problems understanding it.
@cart, I couldn't reply to one of your comments, so I'll just answer it here.
(Edit: moved comment to the original thread here.)
@maniwani
@cart, I couldn't reply to one of your comments,
When comments on existing threads are a part of reviews, they cannot be directly replied to in the review. You can still reply in the original thread. Rather than moving this conversation to the "top level", for continuity can you paste your reply here?