Principia
Principia copied to clipboard
High CPU usage for updating parts in active vessel
High part count vessels have a high performance impact while being loaded.
Unity profiler indicates Principia uses approximately 40% of the CPU cycles on a 265 part vessel.
Journal of 265 part vessel in a 200 km orbit doing nothing: link to Drive
I finally replayed this journal, and unfortunately it's not all that interesting and doesn't reveal any C++ path that would have a cost growing with the number of parts. This journal does mostly three things:
- It serializes and deserializes the Principia plugin, maybe because of scene changes.
- It recomputes the past trajectories of celestials after deserialization.
- It computes the future trajectories of vessels, mostly for the purposes of orbit analysis.
All of this has a cost proportional to the number of vessels/celestials, not to the number of parts. It is actually quite hard to find anything in the profile that talks about parts (that would be FreeVesselsAndPartsAndCollectPileUps
) and even so its cost is around 0.3% of the total.
It's conceivable that the performance hit would come from the C# code and would not be visible on the journal, but to analyze that we would need a save obtained with the stock game with instructions to reproduce. Alternatively, it's possible that there is something wrong with the C++ code, but then we would need a much "cleaner" journal without scene changes, orbit analysis and the like.
I am noticing though that locking/unlocking in ContinuousTrajectory
is somewhat expensive. I'll open a separate issue for this.
Short Journal created with stock KSP 1.12.3 and Principia Ἵππαρχος. Journal is a 1167 parts craft in low Kerbin orbit doing nothing. link to drive The journal box was checked in the tracking station before switching to the vessel. Craft is https://kerbalx.com/BullShootLatinName/SAAB-J35-Draken Save file: link to drive
I'll need access to these files in Google Drive.
Woops, sorry! Should have access now!
Just read the "We are aware of the problem and there is an issue for that but unfortunately, there is no easy solution." comment on the forums, and decided to take a quick look, and I can't help to heavily disagree with that statement.
The C#/managed side of principia in general is quite inefficient due to a general lack of basic optimizations, like avoiding LINQ in hot paths, systematically calling loop invariant code inside loops, tons of useless float -> double -> float casts, total disregard for avoiding GC allocations, using local variables instead of repeatedly accessing properties (remember that most properties on unity objects involve a managed <> native round trip), etc...
Doing some quick tests in a stock install with a single 660 parts vessel in Kerbin orbit, Principia is allocating 430 KB (!) of GC objects and consuming about 34ms per frame (almost half the frame time), the bulk being in WaitedForFixedUpdate()
.
Outside of those general issues, and concerning the specific issue here, the main bottleneck is this : https://github.com/mockingbirdnest/Principia/blob/b9a621cea9508599c603e6130332a1070fe3bf44/ksp_plugin_adapter/ksp_plugin_adapter.cs#L1456-L1460
Rewriting that section to :
Vector3d q_correction_at_root_part = Vector3d.zero;
Vector3d v_correction_at_root_part = Vector3d.zero;
Physics.autoSyncTransforms = false;
Origin origin = new Origin
{
reference_part_is_at_origin = FloatingOrigin.fetch.continuous,
reference_part_is_unmoving = (krakensbane_.FrameVel != Vector3d.zero),
main_body_centre_in_world = (XYZ)FlightGlobals.ActiveVessel.mainBody.position,
reference_part_id = FlightGlobals.ActiveVessel.rootPart.flightID
};
Part activeVesselRootPart = FlightGlobals.ActiveVessel.rootPart;
foreach (Vessel vessel3 in FlightGlobals.Vessels)
{
if (vessel3.packed) continue;
if (!plugin_.HasVessel(vessel3.id.ToString())) continue;
foreach (Part part2 in vessel3.parts)
{
if (!PartIsFaithful(part2)) continue;
Rigidbody partRb = part2.rb;
QPRW part_actual_motion = plugin_.PartGetActualRigidMotion(part2.flightID, origin);
if (part2 == activeVesselRootPart)
{
QP part_actual_degrees_of_freedom = part_actual_motion.qp;
q_correction_at_root_part = (Vector3d)part_actual_degrees_of_freedom.q - partRb.position;
v_correction_at_root_part = (Vector3d)part_actual_degrees_of_freedom.p - partRb.velocity;
}
partRb.transform.SetPositionAndRotation(
(Vector3)part_actual_motion.qp.q,
(Quaternion)part_actual_motion.r);
partRb.velocity = (Vector3)part_actual_motion.qp.p;
partRb.angularVelocity = (Vector3)part_actual_motion.w;
}
}
Physics.SyncTransforms();
Physics.autoSyncTransforms = true;
Results in the WaitedForFixedUpdate()
time going from ~34ms to ~11ms (total frame time 72ms to 47ms) :
I couldn't help doing a bit of general perf optimization, but the main factors are :
- Setting the transform and rb is redundant, as setting the transfrom will set the rb.
- Setting pos/rot in separate atomic operations is extremely wasteful
- Disabling
autoSyncTransforms
and callingSyncTransforms()
manually avoid even more per-part atomic operations from happening.
I didn't do much testing to ensure such changes are free from any side effects, but in theory they shouldn't have any.
From a quick look, a similar optimization could be applied to the next loop (on FlightGlobals.physicalObjects
), and on packed vessels in UpdateVessel()
, although the overhead here is a lot less significant, likely because RBs are kinematic in the packed state.
I checked this using a decompiled/recompiled version of the plugin adapter, as I don't have the right toolchain required to build principia, and downloading it is a bit too demanding for me (I'm on very limited mobile data).
On a side note, I see a lot of vessel.id.ToString()
, which I guess is for interop with the C++ side, but that seems to be a significant part of the remaining GC allocations and perf overhead. Guid
is a struct and is designed for native interop, so it should be relatively easy to use it directly ?
For reference, even after doing a decent pass to get ride of most of the GC allocations (430 KB/frame to 100KB/frame), they still account for about 1/3 of the remaining 11ms frame time eaten by principia.
Oh I am so glad that my misguided comment made you react because I was literally just reading your guide on profiling the C# side of the mod and to be honest it looked daunting.
I'll take a closer look at your findings. Thanks for the input.
Something else I noticed while profiling is that a significant fraction of the remaining frame time after micro-optimizing everything I could is occupied by the is_manageable
and PartIsFaithful
checks. Those checks are performed repeatedly on the same objects from multiple loops over the same collections in WaitedForFixedUpdate
. In the case of is_manageable
, the check itself could be vastly optimized by duplicating the UnmanageabilityReasons()
logic without building an useless string array, but it seems to me that it should be possible to avoid having so many loops over the same collections in the first place.