bevy icon indicating copy to clipboard operation
bevy copied to clipboard

System param config

Open PixelDust22 opened this issue 7 months ago • 14 comments

Objective

Frequently it is desirable to modify the state of a system after the system was already created. For example, in a ScheduleBuildPass (#11094), we're given a list of BoxedSystems and we have no idea what are those systems. There needs to be a way for us to dynamically introspect and configure those systems and their params. We can keep adding methods to the System trait but it's already getting huge. So let's add a general-purpose configurate method to the System trait to cover niche and less frequent use cases.

Note that this PR only addresses post-initialization configuration. Pre-init configuration shall be done with other mechanisms like SystemBuilder.

Solution

Add System::configurate and SystemParam::configurate which takes a &mut dyn Any as input. We call this &mut dyn Any a "configuration token". Systems and system params to decide what to do with it.

For example, in the case of Local, we may be able to add a config token LocalConfig. LocalConfig<T> changes the value of the first uninitialized Local<T>.

Another example. In my application I can use a ScheduleBuildPass to modify system states such that some systems have their ResMut<MyState> SystemParam point to an entirely different instance if they're located in a special SystemSet.

Testing

All bevy_ecs tests passing.

PixelDust22 avatar May 14 '25 00:05 PixelDust22

@chescock This PR addresses an entirely different issue. It gives you the ability to configure a system dynamically, after the system was already built. I'm using Local as an example usage here because it's well understood by the community. Instead of dictating the state of a system upfront, this API allows you to modify it by passing "config tokens" afterwards.

For example, in a ScheduleBuildPass #11094 , you're given some Box<dyn System>. These systems were already created by someone else and you don't know who created them or what SystemParam they have. You just know that, if they have a Local<MyValue>, you want that MyValue to be MyValue(123).

The system builder will not help in this case because

  1. The system was already built, and
  2. You don't know what SystemParam it has

One additional benefit of this PR is that, because this PR separates the "default" system state and "initialize" system state, after this PR is merged, we can get rid of the awkward build_state(&mut World) part of the system builder API. And so the system will be initialized by the scheduler like everyone else. This was impossible previously because you couldn't have a SystemState without initializing it with the world.

So your example is going to look like

let mut schedule = Schedule::default();
schedule.add_systems(
    (LocalBuilder(123_usize), LocalBuilder(456_usize))
        .build_state() // << No world needed yet!
        .build_system(test_system),
);


let mut world = World::new(); // << World can be created later!
schedule.run(&mut world);

fn test_system(local: Local<usize>, local2: Local<usize>) {
    assert_eq!(*local, 123);
    assert_eq!(*local2, 456);
}

PixelDust22 avatar May 14 '25 18:05 PixelDust22

For example, in a ScheduleBuildPass #11094 , you're given some Box<dyn System>. These systems were already created by someone else and you don't know who created them or what SystemParam they have. You just know that, if they have a Local<MyValue>, you want that MyValue to be MyValue(123).

Can you describe the ScheduleBuildPass that you're actually trying to build? I see that this lets you update systems after they are created, but I don't understand why you would need to do that. Why not create the systems later so that the configuration is already available? Or configure them using a resource so that it can be reconfigured as needed?

Updating the local variables of a system that you didn't create sounds like it would break encapsulation! What if the system was relying on the values for soundness? (For contrast, builders can only configure systems if they are exposed as pub fns, and in that case they already have to handle being called as ordinary functions.)

chescock avatar May 16 '25 14:05 chescock

@chescock In my particular case, I have a ScheduleBuildPass that identifies all systems with a SystemParam SubmissionInfo that I defined, look at the system sets and some other information, group certain systems together, and make it so that systems in the same group share the same SubmissionInfo. It does this by configuring the SubmissionInfo SystemParam so that systems in the same group shares the same ComponentId. This allows systems that don't share the same SubmissionInfo to be executed in parallel.

This is very important for the Vulkan-based render backend that I'm working on. Each frame can be broken down into multiple submissions and each submission shares the same set of resources. In order to achieve optimal parallelism for command buffer recording, the scheduler needs to be able to look at the entire render graph and insert submission at the optimal places. This is what the ScheduleBuildPass does: it looks at your entire system graph, group systems together optimally based on some heuristics, insert a submission system for each group, then configure systems in the group to share resources with the submission system.

Why not create the systems later so that the configuration is already available?

Because the configuration came from the system schedule itself. Which doesn't exist until all the systems were added.

Or configure them using a resource so that it can be reconfigured as needed?

Because systems in each group needs to be executed in parallel. You can obviously create a resources that contains an array of SubmissionInfo, alongside a map from system to array index, but then each system will have to contend on this resource and nothing can be executed in parallel. You'll also have trouble identifying the systems. By allowing an external actor (in this case the ScheduleBuildPass) to modify system states, we can precisely specify their component access and maximize parallelism.

Updating the local variables of a system that you didn't create sounds like it would break encapsulation!

It won't break encapsulation, because these configurations must be done through config tokens that the systems define.

For example, in the case of Local, you configure the system state of Local using LocalConfig which is a config token defined by the same module as Local. If an unknown config token was passed to configuarate(&mut self, &mut dyn Any), systems and SystemParams will just do nothing. This is the same idea as builders - you can only configure systems if they're exposed as a config token.

And if you're saying that, if I have a system that behaves well if Local<u64> == 12 but does out-of-bound reads if Local<u64> == 12000, well then it's up to that system to verify. Local<u64> is an argument passed to the system, which is a function. In general a function shouldn't make any assumptions about its arguments. And when it does, it's up to the function to assert those assumptions.

In this particular case, the system can prevent such modifications by making it Local<MyPrivateStruct> where MyPrivateStruct is private. That way the system can be super sure that nobody can construct a LocalConfig<MyPrivateStruct> to modify the system state of that Local<MyPrivateStruct>.

PixelDust22 avatar May 16 '25 19:05 PixelDust22

Triage: failing tests

janhohenheim avatar May 17 '25 17:05 janhohenheim

In my particular case, I have a ScheduleBuildPass that identifies all systems with a SystemParam SubmissionInfo that I defined, look at the system sets and some other information, group certain systems together, and make it so that systems in the same group share the same SubmissionInfo. It does this by configuring the SubmissionInfo SystemParam so that systems in the same group shares the same ComponentId. This allows systems that don't share the same SubmissionInfo to be executed in parallel.

Thanks, that's helpful context!

And if you're saying that, if I have a system that behaves well if Local<u64> == 12 but does out-of-bound reads if Local<u64> == 12000, well then it's up to that system to verify. Local<u64> is an argument passed to the system, which is a function. In general a function shouldn't make any assumptions about its arguments. And when it does, it's up to the function to assert those assumptions.

Right, a pub fn needs to be able to accept any arguments. My worry here is plugins adding systems created using non-pub functions or closures. Today, Locals in those systems are implementation details, but the existence of LocalConfig will make them always part of the public API, which would be surprising.

chescock avatar May 19 '25 14:05 chescock

Right, a pub fn needs to be able to accept any arguments. My worry here is plugins adding systems created using non-pub functions or closures. Today, Locals in those systems are implementation details, but the existence of LocalConfig will make them always part of the public API, which would be surprising.

Any function that interacts with the outside world needs to be able to accept any arguments. A pub fn interacts with the outside world because someone else could call it and pass in arbitrary function. A closure or private function interacts with the outside world once you add it using the add_systems API, even if it's private.

Adding a closure as a system is basically equivalent to registering a callback. Even though the callback function itself is private, the moment you register it, it is already interacting with the outside world, and the input arguments are no longer in your control - because you're not the one calling it.

The pub keyword defines "visibility": you expose your function to the outside world. consent to other people calling your function, therefore making it a part of the public API. When you call add_systems, you also expose and explicitly consent to other people (bevy developers) calling your function, even though the function itself wasn't explicitly defined as pub. As such, we (bevy developers) should be able to call those functions in a well-defined way. In this case, we specify that someone can modify the way we call those functions using the LocalConfig token.

PixelDust22 avatar May 19 '25 21:05 PixelDust22

When you call add_systems, you also expose and explicitly consent to other people (bevy developers) calling your function, even though the function itself wasn't explicitly defined as pub. As such, we (bevy developers) should be able to call those functions in a well-defined way. In this case, we specify that someone can modify the way we call those functions using the LocalConfig token.

Right, my point is that this is changing the contract of how Bevy will call the function. Today, once a system is built, Bevy promises that a Local will never be modified outside of the system, and users may rely on that. After this change, it may be modified by a LocalConfig, so users would need to ensure that they can handle any value. That may be a worthwhile change! But it is a change, and it invalidates some patterns that are valid today.

chescock avatar May 20 '25 17:05 chescock

Today, once a system is built, Bevy promises that a Local will never be modified outside of the system, and users may rely on that.

To reduce controversial-ness, we could introduce a separate Local-like system parameter that accepts outside mutation. Name it VarLocal<T> or something similar.

ItsDoot avatar May 20 '25 22:05 ItsDoot

Rather than adding the concept of "configuration tokens", could we instead just let users operate directly on the system state? One of my reactivity experiments involved using this pattern:

let mut system = IntoSystem::into_system(|local: Local<usize>| {});
system.initialize(world);
let state = system.get_state_mut().unwrap();
*state.0.get() = 10; // get() is required because Local's state is a SyncCell<usize>

The only missing API is system.get_state_mut() (we currently only have the get_state() variant).

From there, the big question is "how do expose writing this state to the user". It seems like we could have my_system.set_state(|state| { state.0.get() = 10; }), which returns a wrapper system that calls that function during wrapper_system.initialize(). Likewise, if we move to systems as entities, devs could just query for the system state and modify it.

I think I prefer that over adding new concepts.

cart avatar May 20 '25 23:05 cart

@cart The problem is that set_state needs to be object safe so that we can call this on a Box<dyn System>. The get_state method you mentioned was defined on WorldQuery which doesn't need to be object safe afaik.

Obviously we can define get_state_mut as a function on trait System that returns a &mut dyn Any.

fn get_state_mut(&mut self) -> &mut dyn Any

But how does the caller know what to cast it into? And even if the caller does know, when the user adds a new param to this system, the downcast could fail.

There's also the problem of encapsulation. Systems may not want to expose all of its internals to the outside world. When the user wants to modify system states, systems and system params may want to ensure that it's done in a well-defined way.

My solution is to reverse this and have the caller pass a &mut dyn Any to the callee. we call this &mut dyn Any a "configuration token".

fn configurate(&mut self, token: &mut dyn Any);

The system distributes the token to each of its params. Params who know what it is will successfully downcast and react to it. Params who don't know what it is will do nothing. If multiple params know what it is, (for example if you have multiple Local<usize>), what happens then will be defined by the token itself. For example in this PR, LocalConfig<usize> is the configuration token and it specifies that it'll only modify the state of the first Local<usize>. If you have multiple Local<usize>, only the first Local<usize> will see its value changed.

I realize that using a big word (like "configuration token") may be a little scary, but really it's nothing other than a &mut dyn Any that represents the change you wanna make to a certain type of System or SystemParam.

PixelDust22 avatar May 21 '25 00:05 PixelDust22

@ItsDoot This PR is really about the System::configurate and SystemParam::configurate APIs, and the necessity to have a "default state" that is present before a system was initialized.

I'm happy to remove the LocalConfig from this PR if it makes the PR easier to merge. This is just an example to illustrate how SystemParam::configure can be useful. We can discuss whether and how to modify the Local systemparam contract and the necessity of LocalConfig later on.

PixelDust22 avatar May 21 '25 05:05 PixelDust22

The problem is that set_state needs to be object safe so that we can call this on a Box<dyn System>. The get_state method you mentioned was defined on WorldQuery which doesn't need to be object safe afaik.

Ah yeah. The usage example in the description is done on arguments passed into add_systems (prior to their type erasure). At that point in time, the original system type (and therefore the type of its state) is still infer-able.

But your actual motivating scenario (setting this in a ScheduleBuildPass) is post-type-erasure and I agree that isn't covered.

There's also the problem of encapsulation. Systems may not want to expose all of its internals to the outside world. When the user wants to modify system states, systems and system params may want to ensure that it's done in a well-defined way.

System internals (in their current form) were not designed to be a "public interface" that users interact with at runtime (or a framework for deciding what is public or private). I understand that this is what you're trying to enable here, but its worth carefully considering the implications of doing that.

I find the "query needs to unwrap its state on every access" and Option<WorldId> designs to be a step backward in terms of "system lifecycle design". If anything, I'd like to move further toward the "systems are the initialized form" model (ex: removing the Option from FunctionSystem::state, SystemParam state being "ready to go"). We shouldn't be burdening runtime system state / performance with the details of our init model. That is the biggest issue I have with this PR (which does remove the Option on FunctionSystem::state, but only by punting the lazy init problem down the road to SystemParam). I don't want to regress runtime system state further here. I'm also not a fan of reframing SystemState to "need" a default value (without a World to initialize it), as this encourages the use of Options and INVALID states, neither of which are satisfying to me.

I understand that this puts your use case in a pickle. But you've kind of kicked a "system lifecycle" hornets nest here. And given how niche this use case is, I'd prefer to approach it carefully and with a vision for the future. I sadly don't have a lot of time to devote to this.

If you want to move this forward, can you give some thought to what this might look like, operating under the following assumptions?

  1. We want to move toward a "System is fully initialized runtime state, all the way down to SystemParams" model (note that this is what I currently want ... this is not something we've resolved to do yet as a collective). This could mean doing something like adding a &mut World parameter to IntoSystem
  2. Configuration happens to either the IntoSystem impl (for pre-init config) or by modifications to the already-initialized-and-valid System state (for runtime tweaks).

Also (as a stretch goal) consider taking into account the fact that Systems might be entities/components going forward. Could we take advantage of that somehow (ex: store configurable state as a component).

cart avatar May 30 '25 19:05 cart

An old draft of mine that reworked the system lifecycle: https://github.com/bevyengine/bevy/pull/2777

cart avatar May 30 '25 20:05 cart

@cart

I mostly agree with your sentiment here.

System internals (in their current form) were not designed to be a "public interface" that users interact with at runtime (or a framework for deciding what is public or private). I understand that this is what you're trying to enable here, but its worth carefully considering the implications of doing that.

Indeed. Each system determines how it wants to interact with the outside world using a select number of public interfaces (Config Tokens). So this is entirely opt-in. When designating the config tokens, receiving systems should modify its state carefully, in a safe and well-defined way. This PR does not propose arbitrary modification of system states.

But you've kind of kicked a "system lifecycle" hornets nest here.

I also agree. There are actually two usage scenarios:

  • Post-init configuration. This has to be done with the configurate method proposed in this PR. It does not require modifications to system lifetimes.
  • Pre-init configuration. In some scenarios this can also be done with system builders when the system params are statically known. It requires modifications to system lifecyclr if we want to enable this.

So I'm cutting down the scope of this PR to only support post-init configuration. No lifetime changes would be necessary if we do it this way. When systems are always initialized by default, there will be no "pre-init configuration" scenario and so it'll naturally be a smooth transition to that end goal.

PixelDust22 avatar May 31 '25 20:05 PixelDust22