Objective

Fixes #12144

Every Bevy app has (at least) two ECS Worlds, the main world and the render world. These are held separately so gameplay code can run parallel to the rendering code. For changes in the main world to affect the render world (and eventually the screen), the data from the former has to be hoisted over to the render world in the Extract stage. See the figure I've graciously stolen from @inodentry:

The article in the cheatbook also gives a good overview.

This extraction happens for every frame and in order to not overload the render world with entities, all entities are removed in the Cleanup stage, a frame earlier. This, however, causes a problem for more advanced ECS features like Components as Entities and Queries as Entities. Queries and Components are not extracted (nor does it make sense to) and it doesn't make sense to remove them every frame. It causes very big, very annoying problems.

Solution

Stop removing all entities every frame and instead of extracting all (render-relevant) entities, keep more careful track of them by way of an entity-entity mapping between main world entities and render world entities.

Drawbacks

Currently, there is a one-to-one mapping between render world entities and main world entities, i.e. they have the same IDs in both worlds. This is likely not going to be the case moving forward. This has never been a guarantee, so for the few people who have relied on this behaviour: I'm sorry.

Also, and I feel like I can say this with some confidence, the performance is going to get worse. How much worse, I cannot say yet, but given that we are going to do a lot more bookkeeping, it's expected to get worse.

Jul 09 '24 18:07 Trashtalk217

current implementation has a flaw. Simply using Added is insufficient for correct functionality. When a main world entity is despawned or its component is modified which need to sync with render world, the corresponding render entity fails to update appropriately.

Jul 10 '24 16:07 re0312

IMO, to create a robust retained render world, we should follow these steps:

The render world currently generates auxiliary entities for certain elements like UI and lighting. To optimize this process, we should restructure these related entities as components or descendants of their source entities. This approach will prevent unnecessary regeneration of these entities every frame after implementing a persistent render world.
Establish a sync system(using observer maybe?) for main world entities that need representation in the render world (e.g., cameras, lights, UI elements). Implement a synchronization mechanism to ensure that entity addition and deletion are mirrored in the render world within the same frame. This process will yield a Main Entity to Render Entity mapping.
Carefully sync the dependency between entities in the main world (such as camera render targets). Ensure that these dependency remain valid when synchronized to the render world. This step is the most complex and carries the highest risk, and potentially introducing issues that could affect user-side .
Keeping the existing extraction logic for synchronizing all relevant components each frame. However, modify the target of these updates to be the corresponding render world entities.
(Optional) In place update relavent Component instead of extracting it every frame.

To avoid complexity and risk in steps 2 and 3, some bit magic operations on the entity might be very helpful.

It's best to split it into multiple PRs. Step 1 can be the first PR, steps 2 through 4 can be grouped into the next PR, and step 5 can be handled last.

Jul 10 '24 17:07 re0312

Thank you for the advice @re0312, it's a lot more of a sophisticated approach than I'm going to take initially. If this doesn't yield satisfactory performance results I will look into implementing your approach. The plan is now as follows:

Sever the 1-to-1 mapping between render world and main world entities. This goes either via a MainWorldEntity(Entity) component for each render world entity for which it is relevant, or a EntityHashMap<Entity> resource.
Next add a marker component to everything that is extracted or created in every frame.
Instead of calling clear_entities each frame, despawn all entities with this marker component.

This achieves the primary goal of this PR while being relatively easy to implement. I'll have to check the performance of course, but hopefully, there are so few extracted entities, that it's negligible (a lot of rendering doesn't use entities at all).

Jul 10 '24 21:07 Trashtalk217

I've done all the things I said I would do, while continually testing it on the 3d_scene example. It works (tm), but I have to test more examples to ensure that everything stays working.

One downside of the EntityHashMap approach is that the extraction can no longer run in parallel, as every extraction system needs a mutable reference to the mapping. This is quite a slowdown, but given that the number of extractions for 3d_scene is about 4, I have not had any major issues.

Jul 21 '24 21:07 Trashtalk217

This CI warning appears to be due to this PR.

How does the performance compare, just very broadly on say many_animated_foxes and bevy_mark?

Jul 21 '24 21:07 alice-i-cecile

Extraction not being able to run in parallel is probably going to be a deal breaker. But you should run the stress tests and check the perf.

Jul 22 '24 00:07 hymm

I'll get around to a full review tomorrow probably. But like @hymm mentioned adding the ResMut<MainToRenderEntityMap> to every system breaks our parallelism. Since commands are sequential anyways, could we have a custom command like this:

struct SpawnExtracted<B: Bundle> {
  bundle: B,
  main_world_entity: Entity
}

that accesses the entity map internally? That way we can avoid extra overhead in Extract too, and we don't seem to care about overhead in ExtractCommands as much.

Jul 22 '24 05:07 ecoskey

take a quick review ,the current state appears to have several drawbacks:

some extractions system no longer run in parallel ,which could impact performance especially in real-world scenarios. however, given that the corresponding system costs are relatively small, the performance loss may be acceptable.
entity relation dependencies have not been properly handled. When combined with 1, this exacerbates the difficulty of manually managing dependency relationships. Now, you must explicitly specify the system running order.
2D rendering has not been set up, preventing correct rendering.
ui rendering continues to generate auxiliary entities that are not despawned, potentially causing memory leaks when using bevy_ui.
numerous rendering features have not been properly configured, including but not limited to:TAA,Volumetric fog,Auto-exposure,Depth of field,Probe

Jul 22 '24 06:07 re0312

@re0312 Could you elaborate on / give an example of 2? I'm not entirely sure what you mean.

Jul 22 '24 10:07 Trashtalk217

@re0312 Could you elaborate on / give an example of 2? I'm not entirely sure what you mean.

Component may depend on other entities. For example, consider Camera::RenderTarget. If Camera_A (1v1) renders to Window1 (2v1), and Camera_B (3v1) renders to Window (4v1), you must ensure these relationships remain valid after extracting them to the render world. Simple component cloning is not enough.

Therefore, you must manually set up every component used in the render world which maybe depends on other entities (not just cameras). This complexity is why I previously mentioned that it's risky and challenging to expose this functionality in an ergonomic way.

Moreover, if the dependency relationships are multiple or involve complex nested structures, you cannot rely on a simple ExtractComponentPlugin. Instead, you must manually specify the order of the relevant extraction systems to ensure all dependencies entities are properly spawned.

Jul 22 '24 11:07 re0312

I think it'd work to switch to a component that stores the render ID, RenderId(Option<Entity>), instead of using a hashmap. So any entity that needs to be extracted would have the component and then you'd query for it in the extraction system. If you needed to map an entity stored in another component you could use Query<Entity, &RenderId> to do that.

Jul 23 '24 06:07 hymm

Would we have to do any extra shenanigans to add a RenderId component in the main world compared to a MainId component in the render world? I did a little experimentation with the latter and it's possible to greatly reduce calls to the hashmap and do the remaining ones outside of Extract anyways.

The flow is basically:

Spawn newly added components in the render world, inserting to the hashmap during ExtractCommands, and mark them as needing entity mapping.
Use the MainId component to check for changed components in the main world. Extract the ones that changed and mark them as needing entity mapping.
Despawn components that were despawned in the main world. If the entity itself despawned, despawn self and remove the entry from the hashmap.

After extract but before all other rendering, update marked components using the hashmap.

Jul 23 '24 06:07 ecoskey

Would we have to do any extra shenanigans to add a RenderId component in the main world compared to a MainId component in the render world? I did a little experimentation with the latter and it's possible to greatly reduce calls to the hashmap and do the remaining ones outside of Extract anyways.

https://github.com/re0312/bevy/tree/retain_world I did some experiment and now the majority of examples can work correctly and no performance drop.

Jul 23 '24 06:07 re0312

no performance drop.

I'm a bit surprised by that if we're losing most of the multithreading. I've done tests before switching to the single threaded executor and the Extract schedule was one of the biggest benefactors of multithreading. How are you measuring this? Do you have a tracy log of many foxes?

MainId vs RenderId

I suggested RenderId since you'd be able to use Query::get to map Main Entity -> Render Entity and that would be fast. I realized that it might be hard to write the RenderId, would probably need some system param that gave you main world commands and they can't be stored on the system since those are sent to the render world. How does MainId work? is there a fast way to do the mapping?

Jul 23 '24 16:07 hymm

How does MainId work? is there a fast way to do the mapping?

The trick would just be to do the mapping during command application so we avoid extra overhead during extract. It would also only apply on the initial insert, so syncing data for existing components would stay fast.

In the case of needing to update Entity fields on components, ECS access is an indirection or two anyways so a hashmap is the simplest way to go IMO. And we can also do that outside of Extract itself.

Jul 23 '24 16:07 ecoskey

I'm a bit surprised by that if we're losing most of the multithreading. I've done tests before switching to the single threaded executor and the Extract schedule was one of the biggest benefactors of multithreading. How are you measuring this? Do you have a tracy log of many foxes?

The implementation approach I used in #1449 is maintains RenderEntity component for specific entities in the main world(Just the same as your idea), this will allowe Extract to still run in parallel

Jul 23 '24 20:07 re0312

Given @re0312's amazing work on #14449 I don't think this PR is as necessary anymore. I will keep it up (anything can happen), but I suggest potential reviewers to review #14449 instead.

Jul 23 '24 23:07 Trashtalk217

Closing in favor of #14449

Aug 12 '24 15:08 JMS55

Persistent Render World

Objective

Solution

Drawbacks