Terrible performance with more than a few CharacterBody3D's moving around in a production level.
Tested versions
4.0 - 4.2.2 stable, with and without Jolt addon.
System information
Windows 10, Vulkan forward +, Nvidia 3070
Issue description
When using a production level of detail level, and more than around 50 characters moving, the framerate absolutely tanks, like sub-15 FPS. I did some custom profiling with tracy, and most of the performance hit was from the movement code called from both move_and_slide and move_and_collide. When the performance goes below a certain threshold, it starts doing multiple physics updates per frame to make up for it, which then makes the performance worse, up to 8 physics updates per frame, which is the max, then things run at like 10fps in slow motion.
It's possible the issue is that I'm importing everything under one StaticBody3D. Perhaps each shape should be its own static body for the broadphase tree optimization (or whatever this physics engine uses)? Edit: I modified the importer to make a unique StaticBody3D for each collision shape, and that didn't improve performance. Could be that everything is centered at 0, 0, 0, though, and centering each StaticBody3D at the middle of each shape would improve things? Not sure how the tree is implemented (if there even is one, which might be why the perf is so bad if there isn't one).
Steps to reproduce
Load a reasonably complex level with a lot of convex collision shapes. I'm importing one using the BSP importer I made. Throw in around 60+ CharacterBody3Ds and have them move around. Note the framrate is awful.
Minimal reproduction project (MRP)
Attached is the example project. I have a few different things you can try. One is my complex level (still Quake level of detail, as it was originally a Quake map). The other is a simpler one made with boxes (can be swapped out on the main_scene.tscn. Also, you can set USE_MOVE_AND_SLIDE to true in the character_body_3d script to test simpler movement which performs a bit better. I also included what I was using to do step moving (default). This uses more moves and thus has worse performance.
I too have been experiencing similar problems in my project. FPS tanks to sub 15 or even single digits when I have around 10 or 15 enemies navigating. After a lot of trial and error I found that my issue is mainly on the physics side too.
Testing your MRP, I got something similar to what I´ve been seeing on mine (v-sync off): My config: 3700x / 3060 12Gb / 32GB RAM
Main Scene Default Physics with move and slide FALSE: ~12 FPS - GPU at ~6% Default Physics with move and slide TRUE : ~19 to 109 FPS (VERY inconsistent and all over the place) - GPU at ~25%
Jolt with move and slide FALSE: ~420 FPS - GPU at ~37% Jolt with move and slide TRUE : ~800 FPS - GPU at ~89%
Level Boxes All tests above 1000 FPS and 97%+ GPU utilization, so it is hard to really say what is happening, but I'm glad to do more tests.
As it stands, a full game, even with Jolt, will have a hard time keeping a good framerate, as we are already consuming more than half of the CPU budget. Don't know if those results where in line with what you are seeing, but considering this level has a simple geometry, and nothing more is happening, I would expect to be GPU limited on every single scenario.
Another thing that I noted is that my CPU apparently never got hit that hard? Even when I'm clearly CPU bound (Uninformed guess here as I have little to no exp testing physics engines)
Proc snapshot while running with Default Physics and move_and_slide FALSE
(the test ran for more than 60s, so it spans the whole graph)
Switch to jolt because the built-in physics is just... don't use it.
With jolt enabled, went from 15fps to 100~fps, a good start.
Next, rather than a static body with hundreds of convex shapes, this is static level geometry so make it polygon soup. Didn't feel like messing with the importer so used the debugging method Mesh.create_trimesh_collision() on the non-transparent visual data, which created:
1x StaticBody3D 1x ConcaveCollisionShape3D
Now it runs at 700fps and hundreds of nodes eliminated.
Static level geometry should generally always use concave collision shapes (trimesh), not convex. See https://github.com/godotengine/godot/issues/59738.
Jolt is definitely faster, but still unusable in my actual project (getting 10-15 FPS with all enemies enabled). I've also tried using triangle collision (Very simple to test -- just modify the bsp_reader.gd and set USE_TRIANGLE_COLLISION to true), but still got very poor performance. Curious how it improved things for you so much.
Also, convex collision shapes are typically much faster in physics engines, so it's kind of crazy if trimesh collision is faster. It almost seems like some early broadphase exclusion algorithm is missing or not functioning properly.
Having a single trimesh instead of many convex is naturally going to be faster, especially slowing down the culling which while improving things will slow down if you have a ton of shapes packed together, the step to cull might be the bottleneck here if you have very many shapes in a small space
For static bodies, given reasonably large size, I'd say, in increasing order of performance:
- Many concave shapes
- Many convex shapes
- Single concave shape
- Single convex shape
Naturally breaking things up into reasonably large sections is the best
Edit: remember especially that the more shapes you have the worse the performance with many bodies, each body has to process all of those shapes, so it grows steeper with many bodies and many shapes in the statics
While testing OP's MRP, I found out that deleting CharacterBodies' from the LevelSewer1 is also very, very slow. Each one takes about 10 seconds to get deleted. And the editor freezes during that time.
I thought about creating a separate issue, but wanted to mention that here first, in case it would be a symptom related to OP's FPS drop.
My specs: Godot v4.3.beta1 - TUXEDO OS 3 22.04.4 - Wayland - Vulkan (Forward+) - integrated Intel(R) Graphics (RPL-P) () - 13th Gen Intel(R) Core(TM) i7-13700H (20 Threads)
just modify the bsp_reader.gd and set USE_TRIANGLE_COLLISION to true), but still got very poor performance.
It should output one concave shape attached to one static object. Best create a visual mesh to debug it's working correctly.
Also, convex collision shapes are typically much faster in physics engines, so it's kind of crazy if trimesh collision is faster.
As long as it doesn't move, tri-mesh can be completely optimized. A good physics engine will generate a structure for fast lookup when given a large static mesh. Essentially an improved version of that BSP format for modern hardware.
I don't know how Godot optimizes collisions internally, so take this with a grain of salt, but I've taken a look at the level structure and I believe each character is testing against 7k+ collisions shapes, since all the level's collisions are inside a single staticbodie.
Maybe changing the plugin so it builds a staticbodie per collision shape would make the physics discard collisions. My understanding is that the physics engine is doing something like this; ok this characterbody is colliding with this staticbodie, let me get the collision shape.... (finds 7k+ shapes) sweet baby Jesus on a bicycle!
I tried modifying the BSP importer to make a separate StaticBody3D for every collision shape, but that did not improve performance. It could be because everything was centered at 0, 0, 0.
Definitely seems like there should be some sort of tree or something to early-out most of the shapes, but it either doesn't exist, or it's not setting the bounds correctly for convex hulls.
Just double checked with triangle collisions, and I'm still getting sub 15 FPS:
Here's the project with triangle collision if you don't want to mess with changing the importer consts. I also added another importer const: SINGLE_STATIC_BODY. If set to false, it will create a unique static body for every convex shape. I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0, as mentioned previously, though considering the perf is about the same with triangles, I think something might be bugged with the tree or whatever is used to cull out things that aren't nearby.
So I did some more testing and spawned 30,000 cubes in the other test map. Perf was pretty decent. Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).
So here is some more detailed profiling I did with Tracy.
As I mentioned before, once you drop below a certain framerate, you start doing 8 physics updates per frame, which further tanks the framerate if the physics is the bottleneck.
Zooming in a bit, we see that the move_and_collide is taking around 30-40 microseconds:
A huge chunk of that is in CheckIfStuck:
About 1/3 of that is for culling the AABB, which gets called multiple times. One potential optimization would be to just cull once for the entire move with a little extra epsilon added to account for unstuck movement, etc.:
Not sure if the same optimization would apply for Jolt.
After the stuck check, there's the actual attempt motion, which has a lot of solve_distance calls and then the recovered check. Not sure how much that can be optimized:
@jitspoe Can I ask why the project is running 160 physics tick?
I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0,
var a := AABB(points[0])
for p in points:
a.extend(p)
var center := a.get_center()
for i in range(points.size()):
points[i] -= center
Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).
This is to be expected, except areas for optimization that might be missed a primitive shape will always be more performant, it does a lot of simplifications in the equations and the convex shape doesn't know it's a cube
@jitspoe Can I ask why the project is running 160 physics tick?
I want my games to be fast and responsive on high refresh rate monitors, so I want to guarantee a physics update every frame for 144-150hz monitors. Sadly, I might have to make some sacrifices if it's not possible to improve the physics performance. Perhaps there's a way to do higher update rates for the player only? Some games like sim racing games run the physics at like 1000hz, so I don't think 160 is unreasonable.
I haven't yet tried to center the static bodies within the shape, so everything is at 0, 0, 0,
var a := AABB(points[0]) for p in points: a.extend(p) var center := a.get_center() for i in range(points.size()): points[i] -= center
Thanks! I might give this a try later, but I already experimented having box shapes centered vs. 0,0,0, and that didn't seem to make a difference.
Then I changed the collision to a convex collision shape and perf tanked, so general convex shapes are a LOT slower than boxes (even if they're also just a box).
This is to be expected, except areas for optimization that might be missed a primitive shape will always be more performant, it does a lot of simplifications in the equations and the convex shape doesn't know it's a cube
The actual perf difference isn't as bad as I thought. The overall physics update goes from ~5ms to ~7ms, but with the domino effect of multiple updates per frame, that causes things to go from ~500fps to 20fps in practice.
With the boxes, the cull aabb is about the same (as I would hope) but the solve_static is faster:
@jitspoe Those racing games that run physics at 1000Hz have totally customized physics
(Also those ridiculous Hz do not protect from occasional hilarious physics bugs)
For imported levels with trimesh collision, https://github.com/godotengine/godot/pull/82649 should help improve performance. Convex collision performance should be imporved by https://github.com/godotengine/godot/pull/63702.
I want my games to be fast and responsive on high refresh rate monitors, so I want to guarantee a physics update every frame for 144-150hz monitors. Sadly, I might have to make some sacrifices if it's not possible to improve the physics performance. Perhaps there's a way to do higher update rates for the player only? Some games like sim racing games run the physics at like 1000hz, so I don't think 160 is unreasonable.
If you have physics interpolation (which I strongly recommend to handle framerate variations in general), players can be using any refresh rate and the game will look smooth.
While there's a definitive advantage from bumping the default physics tick rate from 60 Hz to 120 Hz, there isn't much benefit to going above 120 Hz physics in terms of input lag. Going from 120 Hz to 160 Hz only reduces the physics step time by 2.1 ms, while going from 60 Hz to 120 Hz reduces it by 8.3 ms. This is particularly the case if your game's movement is floaty (e.g. slow acceleration/friction), in which case noticing input lag from the physics step is harder.
but with the domino effect of multiple updates per frame, that causes things to go from ~500fps to 20fps in practice.
If you want to reduce the "spiral of death" effect of multiple physics steps per frame, reduce Max Physics Steps per Frame in the Project Settings to a lower value. This will cause slowdown when the game can't keep up though.
The down side to interpolation is then you're putting players with a higher framerate at a delay. So if you have 60 hz physics running at 60fps, you'll have stuff respond immediately. If you're running at a higher framerate, you'll be setting the interpolation target that frame and interpolating over the proceeding frames, so it'll actually be LESS responsive. That said, it doesn't seem like the interpolation option even exists in Godot 4.
And even if it did, I tried setting the physics update rate as low as 20 and I'm still getting sub-20 FPS in my actual project. Something is seriously wrong here. I thought it was just the death spiral of 8 physics updates per frame, but I dropped that limit down to 3, and I'm still getting terrible framerates. The physics process alone is taking over 16ms, which means it's impossible to hit 60fps even with 1 physics update per frame.
Each test_body_motion is taking 100-200 microseconds, x3 average for each body (to do ground/step checks and whatnot). If you have 100 enemies, that's 30-60 ms.
If we could somehow get this more in check, I'd be happy to run other things at a lower update rate if I could run the player at a higher tick rate. Is there a way to do that?
@mihe Any chance you could take a look at this on the Jolt side and see if there's any low hanging fruit to fix performance wise? Jolt performance is better, but still too low to update more than a few things every frame.
I think we need to re-center this discussion. There are a number of open questions that this report has raised. I think it would be worth trying to answer the questions separately.
- Is 50 CharacterBody3Ds too many? Is there a better approach to having many enemies with physics?
- What is the impact of high fixed FPS physics updates? i.e. should this be kept to a low number?
- Is there an inherent limitation to convex shapes that makes them so slow, or can they be optimized?
- Can the situation be improved by separating ConvexShapes into their own static bodies (not centered at 0,0,0)?
In addition to the open questions there are some clear insights from this:
- Physics is creating a bottleneck on the main thread. Perhaps there is room for multithreading
- Jolt is significantly faster than Godot physics, we have a lot of room to improve performance in Godot physics
I think we need to re-center this discussion. There are a number of open questions that this report has raised. I think it would be worth trying to answer the questions separately.
I can address several of these based on the research I've been doing.
1. Is 50 CharacterBody3Ds too many? Is there a better approach to having many enemies with physics?
I feel like if the engine can't handle 50+ active enemies, that's a huge detriment. That's something games have been doing since the 90's. Games like DOOM would bombard players with massive hordes of enemies. It should be able to handle hundreds if not thousands. Also, I haven't even started with ragdolls, which will add a significant number of additional capsules per enemy.
2. What is the impact of high fixed FPS physics updates? i.e. should this be kept to a low number?
This could be a different discussion, especially to open the possibility of multiple physics updates at different rates. That said, in my actual project, the physics is taking more than 16
3. Is there an inherent limitation to convex shapes that makes them so slow, or can they be optimized?
The convex shapes aren't THAT much slower than primitives (Initially, it looked like an extreme difference because of the spiral of death causing 8x physics updates per frame, but in reality there's maybe a 20-30% improvement to the solving calls which is probably < 10% total difference overall).
4. Can the situation be improved by separating ConvexShapes into their own static bodies (not centered at 0,0,0)?
No. I've tested several different combinations of centering convex shapes, centering static bodies and having no offset of the convex shape, etc. I haven't found anything that provides a notable improvement.
In addition to the open questions there are some clear insights from this:
1. Physics is creating a bottleneck on the main thread. Perhaps there is room for multithreading
Possibly, but Quake did 3D physics single threaded back in the 90's and achieved great performance. Multithreading gets very difficult to debug, so I'd prefer to avoid it where possible. 😅
Also, if the physics update takes more than 16ms, doing it in its own thread is still insufficient, and trying to handle multiple bodies all moving in their own threads that could interact with each other is ... 😬
2. Jolt is significantly faster than Godot physics, we have a lot of room to improve performance in Godot physics
Sadly, even Jolt is not fast enough for my use case. I thought maybe there was some fundamental issue with the way the Godot physics server was set up or something that impacted both Godot and Jolt physics, but I haven't found any evidence of this so far, so I'm unsure what the next steps are.
I thought I had a course of action here based on what I found with the MRP, but my actual project has much worse performance for some reason.
Even if I drop the physics update rate to 60hz and limit to 3 physics updates per frame and even if I replace all the convex shapes with box shapes, I'm still getting sub 20 FPS with 100 enemies.
Doom or Quake had very simple physics, limited to very simple shapes.
But yeah a modern game engine should be able to handle 50 physics bodies with ease
I tested both with and without USE_TRIANGLE_COLLISION, with respective tick rates and for fun, duplicated LevelSewer1 side by side to run twice the amount of level and characters.
| USE_TRIANGLE_COLLISION | FALSE | TRUE |
|---|---|---|
| Godot - 60hz | 200-300 fps | 235-390 fps |
| Godot - 160hz | 10 fps | 12 fps |
| 2x Godot - 60hz | 4 fps | 4 fps |
| 2x Godot - 160hz | 6 fps | 8 fps |
| Jolt - 60hz | 800~ fps | 900~ fps |
| Jolt - 160hz | 400-600 fps | 750~ fps |
| 2x Jolt - 60hz | 500~ fps | 580~ fps |
| 2x Jolt - 160hz | 15 fps | 300 => 20 fps |
| Editor/Import performance | very slow | normal |
Remember that nodes aren't free, each one has to be parsed, reference counted, mapped to a corresponding physics body, and all for a static structure that's never touched at edit-time. It slows the editor down horrendously. Convex data is also intended to be reused, and since every mesh gets it's own convex data with this import, it's entirely redundant. There's no advantage to using separate convex meshes over triangles for this case. None.
Interestingly, Jolt finally did fail on the 2x 160hz test with a gradual cascade.
How does the "nodes aren't free" comment compare to e.g. Unity or Unreal that can run the equivalent ootb without performance problems?
Just to shed some light on why body_test_motion (and move_and_collide/move_and_slide as a result) is taking so long, you can essentially think of it as doing the following:
- Up to 4x
collide_shapecalls for the depenetration/recovery, for every shape in the body - 10x
collide_shapecalls for the cast, for every shape in the body - 1x
get_rest_infocall, for every shape in the body
(Mapping these collision checks to collide_shape isn't entirely true for Godot Physics, since it uses solve_distance for the cast as opposed to solve_static, but whatever. You also have the AABB culling on top of this as mentioned above.)
Godot Jolt differs a bit here as well. First of all I vary the amount of collision checks for the cast based on the distance, mainly to improve precision, as opposed to using a fixed number of checks like Godot Physics, so for Godot Jolt that cast is more like 5-17 collide_shape calls. Second, due to a more or less unfixable regression move_and_slide will with Godot Jolt often run the floor-snapping needlessly, which results in yet another call to body_test_motion, which effectively doubles the amount of collision checks listed above. So you can very easily end up doing an average of 20 collision checks per move_and_slide call, and a lot more than that if there are multiple slides happening, or if moving at greater velocities.
The thing that sticks out in that list is of course the cast part. Juan put up a PR to address this a while back (#70522) that replaces the (somewhat odd) binary search that Godot does (and forces upon every physics implementation) for its shape-casting with a more traditional sweep test. However, I know for a fact that this won't work with Jolt without removing the safe/unsafe fractions from PhysicsTestMotionResult3D, since you can't guarantee that the returned fraction will be safe nor unsafe, which is a fairly substantial breaking change in my opinion. From what I understand from other people having looked at this, this holds true for that Godot Physics PR as well.
My personal stance on move_and_slide is largely that it might be suitable for something like a main character, but should ideally not be used for anything else, unless you have plenty of performance headroom. I would try to reach for simpler physics queries and stuff like navigation meshes for something like an NPC.
Lastly, since I keep seeing early id Tech-derived stuff getting brought up everywhere, keep in mind that those character controllers relied entirely on AABB checks against the BSP tree, as opposed to more general/arbitrary collision checks. I would love to see an AABB check added to PhysicsDirectSpaceState3D, along with a proper sweep test, but I struggle to see move_and_slide being able to utilize any of it while preserving backwards-compatibility.
(I can't comment on the performance impact of structuring the level in the way that's shown in the MRP, but compound shapes aren't exactly free either, even if I do use the more optimized immutable StaticCompoundShape found in Jolt. I am curious about what exactly is taking time when running with Jolt though, so I might do some profiling later on.)
I would try to reach for simpler physics queries and stuff like navigation meshes for something like an NPC.
This is something I would love a tutorial on (I have staircases in my projects, how do I handle the NPCs walking on them w/o move and slide (that I had to call twice because otherwise I had them clip through)
How does the "nodes aren't free" comment compare to e.g. Unity or Unreal that can run the equivalent ootb without performance problems?
They aren't free either. Blueprints still need to build, Entities still have a presence in the managed heap.
UE does a lot of baking, without a high end PC it's just not practical.
I've created a new version of my BSP importer which allows reading the BSPX format which can store the original brush collision shapes (using -wrbrushes with ericw tools). Now much of the collision is using boxes and isn't split up as much:
Here's the project with the improved collision: test_character_body_perf_col_opt.zip
Also dropped the physics updates to 60hz and capped at 3 updates per frame. Even with all that, I'm still dropping to like 15fps with 100 moving characters (and we're not even dealing with animations and other things that also hit perf).
Seems the only solution right now is to stagger the physics and animation updates of enemies across multiple frames. In order to do this, it would be good to be able to have a physics frame update every process frame so we can update the player more frequently and stagger the enemy movement more evenly across frames: https://github.com/godotengine/godot-proposals/issues/10015 (and not have some frames where we're doing twice as much and other where we're doing nothing).
I really think we should be able to get better perf on a quake level of detail world, though, even with a lot of extra splits...
Staggering updates and the improved importer have drastically improved my performance, though. Just need a better way to spread them more evenly across frames.
I've just downloaded and tested the latest version of the MRP to try on my system. Here's the specs for reference: Godot v4.3.beta2 - Windows 10.0.22631 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2070 SUPER (NVIDIA; 31.0.15.5222) - AMD Ryzen 7 3800X 8-Core Processor (16 Threads)
Running the project as provided produces a stable framerate of ~15-18fps as reported. However, swapping the flag to use move and slide instead resulted in a consistent fps of 120 and above. Even increasing physics ticks to 120Hz still resulted in a stable 30fps.
This indicates that the issue you're experiencing is less to do with the Physics Server and more to do with your own movement code. Looking through your step_move function, it's easy to see why your performance is so bad. There is a guaranteed 2 calls to move_and_collide() per physics step per CharacterBody3D, and up to a maximum of 15 in the worst case scenario. Add to this that none of the CharacterBody3D's are using any sort of navigation or collision avoidance, resulting in the bodies constantly colliding with each other and, on occasion, trying to walk into corners.
While I do agree that there are optimisations that could be made to Godot Physics, which Jolt has done a good job of, this issue seems to be a classic case of confirmation bias. Godot Physics has a reputation for being slow/bad, whether deservedly or not, and this project seems to have gone out of its way to try and prove that to be the case.