opensim-creator icon indicating copy to clipboard operation
opensim-creator copied to clipboard

Investigate bug report from D. K. Smith

Open adamkewley opened this issue 1 year ago • 8 comments

Received in private email (search: "opensimcreator project Smith")

User reports that OSC is crashing whenever he loads his model file in it. And that the issue did not exist in v3.2 (assuming he means 0.3.2).

adamkewley avatar Sep 08 '23 05:09 adamkewley

Private (research) model requested and received. I'll hold this in private storage as issue_773_source_model.osim.

adamkewley avatar Sep 08 '23 05:09 adamkewley

The model was sucessfully loaded with no crash or error in the following configurations:

  • MacBook Air, Ventura 13.5, OpenSim Creator development version
  • MacBook Air, Ventura 13.5, OpenSim Creator 0.5.1
  • MacBook Air, Ventura 13.5, OpenSim Creator 0.5.0
  • MacBook Air, Ventura 13.5, OpenSim Creator 0.3.2
  • Windows 11, OpenSim Creator development version
  • Windows 11, OpenSim Creator 0.4.1
  • Windows 11, OpenSim Creator 0.3.2
  • Ubuntu 22, OpenSim Creator development version
  • Ubuntu 22, OpenSim Creator 0.5.0

So I cannot reproduce the error with the given model file - I'll ask him for more information.

adamkewley avatar Sep 08 '23 05:09 adamkewley

One part of his bug was fixed in 16f46e7 but it wasn't tagged with an issue number because I noticed it in passing.

adamkewley avatar Sep 12 '23 10:09 adamkewley

I have received a file that crashes from DKS. Details are:

  • Renaming anything in the model crashes
  • It appears to be related to the creation of circular _slave joints
  • Happens during .finalizeConnections
  • It appears to be another prod instance of bad socket lifetime management, which is going to require runtime lifetime checking (upstream issue: https://github.com/opensim-org/opensim-core/issues/3533) to fix

Tested against:

Version Result
main Broke
0.5.2 Broke
0.4.1 Broke
0.4.0 Broke
0.3.2 Broke
0.2.0 Broke
0.1.6 Working
0.1.0 Working

So there might be hint about how to fix it in the diff between 0.1.6 and 0,.2.0

adamkewley avatar Sep 17 '23 07:09 adamkewley

The reason why 0.1.6 works is because it doesn't go through the step of re-finalizing connections:

  • https://github.com/ComputationalBiomechanicsLab/opensim-creator/blob/0.1.6/src/Screens/ModelEditorScreen.cpp#L148

It subsequently crashes if a connection-finalizing operation is performed later in the editing cycle (e.g. change the name, no crash; then change a joint, crash - but the crash is related to finalizing connections).

adamkewley avatar Sep 17 '23 08:09 adamkewley

I created a simplified version of the model graph to visually inspect where exactly the cycle is in this particular model:

image

It looks like pectoral_girdle_b is attached to ground via two "directions". I.e. sometimes it's the parent, sometimes it's the child, in a mobilizer relationship. This is accepted by SimTK, but creates a graph cycle, which leads to OpenSim generating slave bodies, which leads to this particular bug.

adamkewley avatar Sep 28 '23 15:09 adamkewley

3b24088 adds an even simpler bug reproduction, which is a file containing only two bodies and three joints.

The simplest possible failing version would be a single body and two joints, but that is co-incidently caught by https://github.com/opensim-org/opensim-core/issues/3299


Now that I have delved into the segfault debug trace, reproduced the bug in a way that's independent of mesh data, muscles, etc. I think it is safe to conclude that this particular bug is related to how OpenSim's sockets retain pointers for longer than needed.

What's happening here is:

  • Connections are finalized
  • Which, because the model contains a cycle, automatically creates a slave body
  • And the necessary socket connections etc. to that slave are created
  • Then the connections are re-finalized in some cases by OSC, e.g. because re-naming a component in the model benefits from following socket pointers (rather than name strings) in order to get the new name
  • But re-finalizing connections does the whole graph traversal dance, which deletes the old slave and creates a new slave in a different memory location.
  • After doing that, it then goes through all the sockets, re-finalizing them
  • But re-finalizing sockets checks "does the socket currently have a pointer (it does), follow that pointer for socket-related maintenance and then write out the latest name (the behavior we want)
  • But the pointers held in the sockets of anything that was connected to the first version of slave are stale--the original slave has been deleted--segfault.

So the "hollistic" solutions to this (in OpenSim) are:

  • All sockets in components that point to slave elements are proactively cleared of pointers before graph traversal - this requires a change in OpenSim because only OpenSim internally knows which components are "adopted", etc.
  • OR: If graph traversal wants to make a slave, check if it has already been created first, rather than proactively wiping out all slaves and re-creating them (unsure how easy it is to track)
  • OR: Stop using pointers in sockets (might be slow)
  • OR: Add runtime lifetime checking to sockets, so that stale pointers are caught/reprocessed at runtime, rather than being assumed to always be correct (https://github.com/opensim-org/opensim-core/issues/3533)

adamkewley avatar Sep 28 '23 16:09 adamkewley

Non-"hollistic" solutions include:

  • Monkey-patching OSC to include tricks like having its own socket-correcting pass when renaming things in the model

But that comes with its own baggage.

adamkewley avatar Sep 28 '23 16:09 adamkewley

User reported that this problem appears to be fixed for their model, so closing this

adamkewley avatar Jun 13 '24 08:06 adamkewley