Move some skeleton building to cgame, batch IPC
Complementary pr to: https://github.com/Unvanquished/Unvanquished/pull/3146.
I've profiled building-heavy scenes, and the most time was spent waiting for IPC when building the skeleton. The building of the skeletons itself can be moved completely to cgame I believe. I'm not sure if tr.animations can be moved there as well, however, since it's also used for other things (at least for culling). Additionally, I might implement GPU-driven skeleton building, which might be even faster since it would avoid uploading all the bones, doing more shader switches etc.
So for now I've copied the skeleton build functions over to cgame and added an IPC call to get a bunch of animation structs at once. There are definitely still things to clean up with that code, and there's probably a better more general solution, but this works already: this scene on plat23 with 176 eggs + om in view went from 50 to 100 FPS from this change:
Profiling shows that most of the time is now actually being spent building the skeleton, in some SSE function, rather than waiting for IPC.
The layout I used to test this can be found here: https://users.unvanquished.net/~reaper/maps/layouts/plat23/test.dat.
Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.
Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.
All buildings. I'll see if I can make it not require IPC calls at all, but it would probably require duplicating tr.animations for the time being.
It looks like there's also some really stupid back-and-forth with a ton of copying: getting the skeleton from engine, copying it into a refentity_t, then that entity gets sent through IPC to engine and engine copies it, copying the skeleton again... From what I can tell, cgame really only needs to know about the skeleton for adding weapons/upgrades, the rest can just be done on engine side, without sending the skeleton and copying it through IPC. Cgame can then load the relevant parts of the animations to get bone names for adding weapons and upgrades.
I tried it and got CG_BuildAnimSkeleton: Can't build skeleton (the second one) at various moments. For example each placing a buildable.
Yeah, I'm assuming it's because some value isn't valid on the first frame the entity appears.
This seems to be the best course of action to me right now:
- Temporarily copy skeleton stuff to cgame, like it was done with
trap_R_BlendSkeleton()(the actual engine functionality for it was apparently never cleaned up though). - In
for-0.56, remove most of these IPC calls andrefSkeleton_tfrom entities being sent through IPC and copied. Either create a different IPC call for, or load in cgame, the required parts of the animations. - Engine would then build and interpolate the skeletons based on fields in
refEntity_t.
I think that should be sufficient, but I might've missed some skeleton usage that precludes this.
Temporarily copy skeleton stuff to cgame, like it was done with trap_R_BlendSkeleton() (the actual engine functionality for it was apparently never cleaned up though).
So I've tried this, but went down the rabbit hole of loading iqm to get the animations, so I'm just gonna skip that and make the proper change to for-0.56 instead (well, some/most of the functionality can probably go into master).
I tested this branch (with game counterpart) on a slow device, on plat23 map, viewpos 1920 1920 0 0 0 (renders all alien base), unfortunately that doesn't bring more performance on such scenario. I have not tested with the hundreds-of-models layout.
The hardware is Intel X3100 (GMA 965, Gen 4), on a Core 2 Duo L7500 (dual core). This GPU doesn't implement the features required for model skinning so the driver emulates it. Our own alternate code (enabled with r_vboVertexSkinning 0) is faster than driver emulation. By using this hardware that puts strong pressure on CPU, I was hoping to see a difference, because with such system, every cycles count! Unfortunately I haven't measured any difference:
| cpu | cpu+smp | gpu | gpu+smp | |
|---|---|---|---|---|
| old | 21 | 21 | 11 | 11 |
| new | 21 | 21 | 11 | 11 |
So the branch will likely benefit gameplays like PVE games, but not the vanilla game.
I have started a branch that moves the skeleton stuff to engine instead and skips the expensive copies. This seems to be working faster than this branch, although I still need to make it actually build the skeletons on the engine side with the values it receives.