bevy icon indicating copy to clipboard operation
bevy copied to clipboard

Add `&mut World` as a system param and deprecate `.exclusive_system()`

Open maniwani opened this issue 2 years ago • 39 comments

Objective

  • finish what started in #2398
  • make .exclusive_system() optional (FunctionSystem handles all systems now)
  • lay groundwork for bevyengine/rfcs#45

Solution

  • add ~~SemiSafeCell<'w, World>~~ MaybeUnsafeCell<'w, World> and have it replace &'w World in internal access methods
  • add new required method on SystemParamState so we can (mostly) statically ensure soundness and detect exclusive systems
  • add &mut World as a system param
  • remove ExclusiveSystem* types and traits
  • mark .exclusive_system() method as deprecated

Related

  • #2923
  • #3001 (main inspiration for cell enum/union type)
  • #3946 (superceded by this PR)

maniwani avatar Mar 09 '22 23:03 maniwani

Pulling in @Ratysz as this split was an intentional design decision.

cart avatar Mar 10 '22 00:03 cart

Some historical context: https://github.com/bevyengine/bevy/pull/1144#issuecomment-755608418

cart avatar Mar 10 '22 00:03 cart

For context, this change is part of the groundwork for stageless: it seemed nicely separable, so I encouraged @maniwani to submit it seperately for ease of review.

It would introduce other, far more convoluted kind of complexity: if exclusive systems are systems, they can have dependencies and can be dependencies.

As far as I can tell, this was the primary argument against it. Managing that complexity is where a large amount of the effort behind https://github.com/bevyengine/rfcs/pull/45 was focused: we take on more complexity with respect to managing the ordering of exclusive systems (and command flushing) in order to dramatically reduce complexity elsewhere in the scheduling internals / API / mental model.

Having experimented with (and helped users with) the exclusive system API since that thread, I pretty firmly agree with your initial positions from that thread:

  • From an api perspective, I see no meaningful difference between an "exclusive system" and a system that manually claims every resource.

  • I think preventing people from adding fn system(world: &mut World) {} to a stage "normally" because it happens to also meet the criteria for a stage seems overly restrictive

  • Imo a "function that operates on World and/or Resources" is a "system"

  • I'd still prefer it if systems were "functions that operate on ECS state". In my ideal world parallelism would be the responsibility of executors and would be a "best effort" sort of thing.

I would also add:

  • the added complexity dramatically increases pain when working with scheduler related code
  • multiple insertion points for exclusive systems is largely unused
  • the inability to order exclusive and non-exclusive systems is often frustrating
  • the concern raised in #1021 is handled by tools to handle execution order ambiguities (and other parts of the stageless design that give users explicit control over command flushing): these are rapidly improving
  • initial feedback on the RFC from general users seems to be strongly in favor of a simpler, non-differentiated model for both system organization and system types

alice-i-cecile avatar Mar 10 '22 00:03 alice-i-cecile

I haven't ever stopped believing that exclusive systems should be "normal systems" :smile: Just pulling in relevant context and stakeholders.

cart avatar Mar 10 '22 02:03 cart

From an api perspective, I see no meaningful difference between an "exclusive system" and a system that manually claims every resource.

As I've said in https://github.com/bevyengine/bevy/pull/1144#issuecomment-755655210:

There is: an exclusive system can add/remove entites without using commands, modifying archetypes then and there, and it can add and remove resources in the same manner; a system that manually claims everything can only change the data contents, not the structure of the data.

This is why had the split conceptually. Every exclusive system is potentially a hard sync point - the "side effects" stageless idea was supposed to codify which exclusive systems exactly are sync points, and which can be scheduled along with parallel systems.

If we can we can do away with the implementation split and the user-facing API split while maintaining this conceptual split (because it's inescepable for actually scheduling things correctly), I'm all for it. The implementation we have is a compromise rather than a deliberate decision - if it were up to me exclusive systems would've been stages instead; ironically, we'll effectively have that with stageless, it'll just be neatly swept under the rug and hidden behind a flatter API.

Ratysz avatar Mar 10 '22 12:03 Ratysz

maintaining this conceptual split (because it's inescepable for actually scheduling things correctly)

What is "inescapable" and how exactly would we fail to schedule these systems correctly?

The "dependency debacle" you described back in https://github.com/bevyengine/bevy/pull/1021#issuecomment-743429068 could not happen.

far more convoluted kind of complexity: if exclusive systems are systems, they can have dependencies and can be dependencies

This is desirable IMO, and there's nothing convoluted about it, because:

the user adds a [dependency chain of parallel system -> exclusive system -> parallel system], the execution will fail because the executor can't sandwich an exclusive system like that

It can sandwich them and it won't fail. All we had to do was add a mock component and define "total write access" for both the schedule builder and executor to see the conflict properly.

By themselves, all those systems want conflicting access to component A. If their order is indeterminate, the schedule builder would flag a pair as ambiguous, so it is not a heisenbug.

systems 1 and 2 have no defined order but want conflicting access to components: A

We don't even need to say anything about one system being exclusive, because it's irrelevant. There is ambiguous read-write order and that is the "bug".

So from my perspective, there is nothing about exclusive systems worthy of a formal dichotomy and this is purely a clash of aesthetic preferences.

And to that, I don't think that "systems must use commands when they don't have exclusive access to the entire world" is convoluted. Our storage model should just be documented with pretty pictures depicting why we can't instantly spawn entities and insert components without moving data.

maniwani avatar Mar 10 '22 15:03 maniwani

For context, the discussion in #1021 occured before we had any formal notion of "system order ambiguities", much less tools to deal with them. I read the linked comment as an initial exploration of the idea (because it's an exteremely obvious case of it).

I agree with @maniwani here: there's nothing fundamentally more challenging about scheduling / ordering exclusive systems or command flushes: they're just guaranteed to be ambiguous with (almost) literally every other system.

alice-i-cecile avatar Mar 10 '22 15:03 alice-i-cecile

I am talking about the actual implementation, again. When a systems' graph is scheduled, exclusive systems must necessarily be placed at exact spots wrt blocks of specific parallelizable systems, otherwise we massively lose perf because we'd have to effectively rebuild the graph every dispatch. As long as this is understood, I see no issue with the arguments you give.

Again, the API could be whatever we want. This is not about the API.

Ratysz avatar Mar 10 '22 15:03 Ratysz

When a systems' graph is scheduled, exclusive systems must necessarily be placed at exact spots wrt blocks of specific parallelizable systems, otherwise we massively lose perf because we'd have to effectively rebuild the graph every dispatch.

Right, because otherwise we're spinning our wheels constantly checking a condition (can this exclusive system run) that will never be met.

I see a few options here on the implementation side:

  1. Forbid all ambiguities with exclusive systems.
  2. Randomly decide and lock in the ordering of exclusive systems WRT any ambiguities upon schedule initialization, and then add constraints to enforce this.
  3. Defer all exclusive systems as long as possible, breaking all ambiguities to run parallel systems first.
  4. Run all exclusive systems as soon as possible.

IMO 1 is probably too strict and frustrating, especially before stageless. 2 is, in theory, in keeping with the spirit of deliberate ambiguities, but very chaotic.

Both 3 and 4 have the nice property of clustering exclusive systems together automatically, which should be good for performance. My intuition is that 3 is a better default than 4: IME exclusive systems tend to be used for taking snapshots of state.

We could formalize this notion with a asap and alap system configuration tool, which would probably be quite useful in general, and allow overriding to the opposite case as desired. This would allow users to resolve all ambiguities for a given system in a single line (except between other ASAP systems).

EDIT: With these tools, I would actually be okay with 1.

alice-i-cecile avatar Mar 10 '22 16:03 alice-i-cecile

I was talking about the actual implementation. You can see where I've updated the system params to add the proper component access. I just haven't ripped out the special-casing in SystemStage yet.

Also, I didn't understand what you were saying here.

otherwise we massively lose perf because we'd have to effectively rebuild the graph every dispatch

If you meant "now we have to tell users to be careful about where they put exclusive systems or they risk needlessly throwing away parallelism", then yeah, but there's no need for us to rebuild the graph. (edit: I see, you meant registering new archetypes and updating the archetype component access bitsets.)

Systems with &mut World now have component and archetype component access that reflect their exclusivity. They can and will be topsorted like any other system. But yes, when the order between an exclusive system and others is ambiguous, when it runs can vary a lot given the executor is greedy (topsort order still gives priority).

IMO 1 is probably too strict and frustrating, especially before stageless. 2 is, in theory, in keeping with the spirit of deliberate ambiguities, but very chaotic.

IMO, 1 is the only acceptable option. We can reduce the noise and jargon in the ambiguity warnings separately.

maniwani avatar Mar 10 '22 16:03 maniwani

Right, because otherwise we're spinning our wheels constantly checking a condition (can this exclusive system run) that will never be met.

No, that's not it. Exclusive systems can change archetypes, which invalidates parallel systems' cached scheduling data, requiring a rebuild. Moving a parallel system between blocks separated by hard sync points does the same, because said cached data is a bunch of tight and bespoke bitsets build for and around that specific block. It's needed to boil down the absolutely unavoidable "can this run" checks to a single bitset comparison.

Ratysz avatar Mar 10 '22 16:03 Ratysz

Exclusive systems can change archetypes, which invalidates parallel systems' cached scheduling data, requiring a rebuild.

Right, I guess we can't quite run exclusive systems using the executor yet. ~~The system tasks are spawned before we know if any system can run, and that borrow keeps us from updating their archetype component access.~~ Edit: Instead of updating system archetype component access once at the beginning, we'd want to update a system's access just before signaling it to run.

I guess this PR can just be about removing the internal split everywhere except SystemStage.

maniwani avatar Mar 10 '22 16:03 maniwani

The system tasks are spawned before we know if any system can run, and that borrow keeps us from updating their archetype component access.

That's not how the systems executor works. ~~Tasks are spawned as needed~~, data is not actually borrowed until needed. The thing that prevents us from whanging in an exclusive system wherever is exactly what I said it is: there is no guarantee that the exclusive system will not be in some way invalidating, so it necessarily has to split the block of parallel systems into two, and parallel systems cannot be freely moved between blocks because the blocks are pre-built for the specific systems in them.

Ratysz avatar Mar 10 '22 16:03 Ratysz

Okay, after discussing on Discord, I will keep this PR limited to what it says on the tin, getting rid of .exclusive_system() and ExclusiveSystem* types/traits. Keeping the InsertionPoint API and not making any change to the ordering rules of SystemStage (no interleaving of exclusive and non-exclusive systems).

maniwani avatar Mar 10 '22 18:03 maniwani

would this supercede #3946?

hymm avatar Mar 10 '22 19:03 hymm

would this supercede #3946?

Yes.

maniwani avatar Mar 10 '22 20:03 maniwani

Still need to update tests/imports in the other crates, but bevy_ecs seems to be good.

maniwani avatar Mar 11 '22 03:03 maniwani

bors try

maniwani avatar Mar 17 '22 06:03 maniwani

try

Build failed:

bors[bot] avatar Mar 17 '22 06:03 bors[bot]

bors try

maniwani avatar Mar 17 '22 07:03 maniwani

try

Build failed:

bors[bot] avatar Mar 17 '22 07:03 bors[bot]

bors try-

maniwani avatar Mar 17 '22 07:03 maniwani

bors try

maniwani avatar Mar 17 '22 07:03 maniwani

try

Build failed:

bors[bot] avatar Mar 17 '22 07:03 bors[bot]

bors try

maniwani avatar Mar 17 '22 22:03 maniwani

try

Build failed:

bors[bot] avatar Mar 17 '22 22:03 bors[bot]

bors try

maniwani avatar Mar 17 '22 22:03 maniwani

Or alternatively make SemiSafeCell not have a lifetime and just be World | UnsafeCell<World> :thinking:

BoxyUwU avatar Mar 23 '22 20:03 BoxyUwU

That said, if it's confusing, I think it'd be clear if &World and &mut World behave like Query when we update a system's Access<ArchetypeComponentId> except they "match" everything.

GIven how important this access code is to bevy_ecs soundness, I would prefer to optimize for clarity and simplicity over micro-gains to performance. IMO we should scrap them both and just write all 1s where appropriate and avoid special-casing.

alice-i-cecile avatar Mar 23 '22 21:03 alice-i-cecile

since this superscedes https://github.com/bevyengine/bevy/pull/3946 should we pull the example from there for using Locals in an exclusive system?

hymm avatar Apr 04 '22 20:04 hymm

@bevyengine/ecs-team I think this is ready. Can we get some reviews?

@maniwani can you please rebase?

alice-i-cecile avatar Apr 24 '22 20:04 alice-i-cecile

GIven how important this access code is to bevy_ecs soundness, I would prefer to optimize for clarity and simplicity over micro-gains to performance. IMO we should scrap them both and just write all 1s where appropriate and avoid special-casing.

This feels risky to me. It means that if you forget to update archetypes for an exclusive system, it won't properly conflict with other systems that access these archetypes. This is different from forgetting to update the archetypes for a normal system, which would only result in Queries not accessing newer archetypes (which is totally safe).

It makes failing to update archetypes a safety issue rather than a correctness issue, which will almost certainly mean we will need to reconsider the safety of a number of our apis.

cart avatar Apr 26 '22 22:04 cart

(And it also reintroduces the Query<()> unsoundness, since that accesses no components anyway)

DJMcNab avatar Apr 27 '22 06:04 DJMcNab

@maniwani did you mean to close this?

alice-i-cecile avatar Jul 31 '22 01:07 alice-i-cecile

@maniwani did you mean to close this?

No, I did an oopsie.

maniwani avatar Jul 31 '22 01:07 maniwani

Ready for round 2.

maniwani avatar Jul 31 '22 05:07 maniwani

Did another review pass, and I'm really liking the shape of this. No substantive new comments, although I'm in agreement with most of @TheRawMeatball's concerns :)

alice-i-cecile avatar Jul 31 '22 13:07 alice-i-cecile

This push changed the world_access_level() impl in the SystemParam tuple macro. I tried to do the "convert enums to bitflags and OR reduce" thing at first (like people said not to), but realized it'll miss conflicts between multiple exclusive params. I admit, bad idea.

Now it's just an sequence of match blocks.

maniwani avatar Aug 01 '22 06:08 maniwani