bevy_xpbd icon indicating copy to clipboard operation
bevy_xpbd copied to clipboard

Awful performance with many non-physics entities

Open inodentry opened this issue 10 months ago • 3 comments

We had discussed this on Discord previously, but it seems like the issue still persists in the latest version.

We have a 2D game where we use LDTK (via bevy_ecs_ldtk, which is based on bevy_ecs_tilemap). When the level is spawned, that creates a lot of ECS entities in Bevy (>60k for a moderately sized level). None of those entities have any physics components, they are just to represent the visuals of the map.

We have very few physics entities (<100 colliders for walls in the level and a few kinematic bodies, no dynamic bodies yet).

The game runs awfully slowly . I ran a trace to determine what is going on, and it looks like bevy_xpbd_2d's propagate_collider_transforms system is the culprit. As well as Bevy's normal transform propagation being called multiple times, before and after physics.

That system takes around 8ms every sub-tick. I am currently setting SubtickCount to 1, because otherwise this is completely unusable.

It seems like it is iterating through all of those non-physics entities and wasting CPU and time.

In conclusion: bevy_xpbd scales very poorly for applications with few physics entities and lots of non-physics entities. A lot of work is wasted on non-physics entities to process their transforms (even though they do not have colliders or any other physics components).

inodentry avatar Mar 30 '24 08:03 inodentry

Hi, I am also seeing poor performance with colliders. In my case I do have many many colliders at a time, but when I scale down I do not see the expected performance improvement.

~Is it possible to further parallelize collect_collisions? When I run a trace, I see the initialization run in a task pool but collect_collisions always appears in the main thread (I suppose Bevy/Rayon could be deciding this is best). I see it is using par_splat_map but for some reason it is not chopping up chunks into threads for me--it does it all at once in the main thread.~ I was not tracing properly.

DaAwesomeP avatar Apr 02 '24 21:04 DaAwesomeP

Another thing I found: If I spawn a bunch of colliders, check some collisions, and remove layer filters until the only remaining filters are ones without any collider memberships, then I don't get any performance back. The performance indicates that it calculates the same collisions regardless of filters.

DaAwesomeP avatar Apr 03 '24 03:04 DaAwesomeP

Another idea (from Rapier): there should be an option to skip computing contact points (just collisions) for some colliders (i.e. sensors). It would also be a nice optimization to skip recomputing collisions if at least one of the objects is a sensor and has not moved since the last computation.

DaAwesomeP avatar Apr 03 '24 18:04 DaAwesomeP

Hi! Sorry I took so long, but I think I finally came up with a reasonable fix to the issue. #377 has my fix, along with some refactoring.

TLDR: When a collider is added, just mark all of its ancestors with the ColliderAncestor marker component. The ColliderTransform propagation can simply skip traversing trees with no ColliderAncestor entities. This gets rid of the majority of the unnecessary work, and brought performance from ~22 fps to ~200 fps in a test scene with one root entity and 100,000 child entities.

In the upcoming Avian release (see #346), this should be even less of an issue, as the propagation doesn't need to be done in the substepping loop.

This still doesn't fix the fact that Bevy's normal transform propagation is run before and after physics, but it does make the situation significantly better. We can look into further reducing the overhead in a follow-up.

Jondolf avatar Jun 19 '24 10:06 Jondolf

Follow-up! #380 uses a similar idea as #377, but for the rest of the transform propagation added by bevy_xpbd, along with refactoring the logic further. This brings the performance in the earlier test scene from 200 FPS (after #377) all the way to over 490 FPS. It should eliminate almost all of the remaining transform propagation overhead for non-physics entities :)

Another follow-up idea could be to try and unify the ColliderTransform propagation systems and the "normal" transform propagation systems added by bevy_xpbd, and somehow propagate ColliderTransform and Transform in the same pass in some places. Not sure how viable this is. Additionally, we should verify if all of the transform propagation system copies are really needed, or if we could reduce them.

Jondolf avatar Jun 21 '24 20:06 Jondolf