Daemon icon indicating copy to clipboard operation
Daemon copied to clipboard

Move some skeleton building to cgame, batch IPC

Open VReaperV opened this issue 1 year ago • 9 comments

Complementary pr to: https://github.com/Unvanquished/Unvanquished/pull/3146.

I've profiled building-heavy scenes, and the most time was spent waiting for IPC when building the skeleton. The building of the skeletons itself can be moved completely to cgame I believe. I'm not sure if tr.animations can be moved there as well, however, since it's also used for other things (at least for culling). Additionally, I might implement GPU-driven skeleton building, which might be even faster since it would avoid uploading all the bones, doing more shader switches etc.

So for now I've copied the skeleton build functions over to cgame and added an IPC call to get a bunch of animation structs at once. There are definitely still things to clean up with that code, and there's probably a better more general solution, but this works already: this scene on plat23 with 176 eggs + om in view went from 50 to 100 FPS from this change: unvanquished_2024-10-23_043029_000

Profiling shows that most of the time is now actually being spent building the skeleton, in some SSE function, rather than waiting for IPC.

The layout I used to test this can be found here: https://users.unvanquished.net/~reaper/maps/layouts/plat23/test.dat.

VReaperV avatar Oct 23 '24 01:10 VReaperV

Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.

slipher avatar Oct 23 '24 02:10 slipher

Which models use the pure-gamelogic implementation? If there are many of those it could be worth breaking out into a separate non-compat-breaking PR.

All buildings. I'll see if I can make it not require IPC calls at all, but it would probably require duplicating tr.animations for the time being.

VReaperV avatar Oct 23 '24 02:10 VReaperV

It looks like there's also some really stupid back-and-forth with a ton of copying: getting the skeleton from engine, copying it into a refentity_t, then that entity gets sent through IPC to engine and engine copies it, copying the skeleton again... From what I can tell, cgame really only needs to know about the skeleton for adding weapons/upgrades, the rest can just be done on engine side, without sending the skeleton and copying it through IPC. Cgame can then load the relevant parts of the animations to get bone names for adding weapons and upgrades.

VReaperV avatar Oct 23 '24 11:10 VReaperV

I tried it and got CG_BuildAnimSkeleton: Can't build skeleton (the second one) at various moments. For example each placing a buildable.

slipher avatar Oct 23 '24 13:10 slipher

Yeah, I'm assuming it's because some value isn't valid on the first frame the entity appears.

VReaperV avatar Oct 23 '24 14:10 VReaperV

This seems to be the best course of action to me right now:

  • Temporarily copy skeleton stuff to cgame, like it was done with trap_R_BlendSkeleton() (the actual engine functionality for it was apparently never cleaned up though).
  • In for-0.56, remove most of these IPC calls and refSkeleton_t from entities being sent through IPC and copied. Either create a different IPC call for, or load in cgame, the required parts of the animations.
  • Engine would then build and interpolate the skeletons based on fields in refEntity_t.

I think that should be sufficient, but I might've missed some skeleton usage that precludes this.

VReaperV avatar Oct 23 '24 15:10 VReaperV

Temporarily copy skeleton stuff to cgame, like it was done with trap_R_BlendSkeleton() (the actual engine functionality for it was apparently never cleaned up though).

So I've tried this, but went down the rabbit hole of loading iqm to get the animations, so I'm just gonna skip that and make the proper change to for-0.56 instead (well, some/most of the functionality can probably go into master).

VReaperV avatar Oct 23 '24 21:10 VReaperV

I tested this branch (with game counterpart) on a slow device, on plat23 map, viewpos 1920 1920 0 0 0 (renders all alien base), unfortunately that doesn't bring more performance on such scenario. I have not tested with the hundreds-of-models layout.

The hardware is Intel X3100 (GMA 965, Gen 4), on a Core 2 Duo L7500 (dual core). This GPU doesn't implement the features required for model skinning so the driver emulates it. Our own alternate code (enabled with r_vboVertexSkinning 0) is faster than driver emulation. By using this hardware that puts strong pressure on CPU, I was hoping to see a difference, because with such system, every cycles count! Unfortunately I haven't measured any difference:

  cpu cpu+smp gpu gpu+smp
old 21 21 11 11
new 21 21 11 11

So the branch will likely benefit gameplays like PVE games, but not the vanilla game.

illwieckz avatar Nov 04 '24 18:11 illwieckz

I have started a branch that moves the skeleton stuff to engine instead and skips the expensive copies. This seems to be working faster than this branch, although I still need to make it actually build the skeletons on the engine side with the values it receives.

VReaperV avatar Nov 11 '24 00:11 VReaperV