opensim-creator
opensim-creator copied to clipboard
Investigate bug report from D. K. Smith
Received in private email (search: "opensimcreator project Smith")
User reports that OSC is crashing whenever he loads his model file in it. And that the issue did not exist in v3.2 (assuming he means 0.3.2).
Private (research) model requested and received. I'll hold this in private storage as issue_773_source_model.osim
.
The model was sucessfully loaded with no crash or error in the following configurations:
- MacBook Air, Ventura 13.5, OpenSim Creator development version
- MacBook Air, Ventura 13.5, OpenSim Creator 0.5.1
- MacBook Air, Ventura 13.5, OpenSim Creator 0.5.0
- MacBook Air, Ventura 13.5, OpenSim Creator 0.3.2
- Windows 11, OpenSim Creator development version
- Windows 11, OpenSim Creator 0.4.1
- Windows 11, OpenSim Creator 0.3.2
- Ubuntu 22, OpenSim Creator development version
- Ubuntu 22, OpenSim Creator 0.5.0
So I cannot reproduce the error with the given model file - I'll ask him for more information.
One part of his bug was fixed in 16f46e7 but it wasn't tagged with an issue number because I noticed it in passing.
I have received a file that crashes from DKS. Details are:
- Renaming anything in the model crashes
- It appears to be related to the creation of circular
_slave
joints - Happens during
.finalizeConnections
- It appears to be another prod instance of bad socket lifetime management, which is going to require runtime lifetime checking (upstream issue: https://github.com/opensim-org/opensim-core/issues/3533) to fix
Tested against:
Version | Result |
---|---|
main |
Broke |
0.5.2 | Broke |
0.4.1 | Broke |
0.4.0 | Broke |
0.3.2 | Broke |
0.2.0 | Broke |
0.1.6 | Working |
0.1.0 | Working |
So there might be hint about how to fix it in the diff between 0.1.6 and 0,.2.0
The reason why 0.1.6 works is because it doesn't go through the step of re-finalizing connections:
- https://github.com/ComputationalBiomechanicsLab/opensim-creator/blob/0.1.6/src/Screens/ModelEditorScreen.cpp#L148
It subsequently crashes if a connection-finalizing operation is performed later in the editing cycle (e.g. change the name, no crash; then change a joint, crash - but the crash is related to finalizing connections).
I created a simplified version of the model graph to visually inspect where exactly the cycle is in this particular model:
It looks like pectoral_girdle_b
is attached to ground via two "directions". I.e. sometimes it's the parent, sometimes it's the child, in a mobilizer relationship. This is accepted by SimTK, but creates a graph cycle, which leads to OpenSim generating slave
bodies, which leads to this particular bug.
3b24088 adds an even simpler bug reproduction, which is a file containing only two bodies and three joints.
The simplest possible failing version would be a single body and two joints, but that is co-incidently caught by https://github.com/opensim-org/opensim-core/issues/3299
Now that I have delved into the segfault debug trace, reproduced the bug in a way that's independent of mesh data, muscles, etc. I think it is safe to conclude that this particular bug is related to how OpenSim's sockets retain pointers for longer than needed.
What's happening here is:
- Connections are finalized
- Which, because the model contains a cycle, automatically creates a
slave
body - And the necessary socket connections etc. to that
slave
are created - Then the connections are re-finalized in some cases by OSC, e.g. because re-naming a component in the model benefits from following socket pointers (rather than name strings) in order to get the new name
- But re-finalizing connections does the whole graph traversal dance, which deletes the old
slave
and creates a newslave
in a different memory location. - After doing that, it then goes through all the sockets, re-finalizing them
- But re-finalizing sockets checks "does the socket currently have a pointer (it does), follow that pointer for socket-related maintenance and then write out the latest name (the behavior we want)
- But the pointers held in the sockets of anything that was connected to the first version of
slave
are stale--the originalslave
has been deleted--segfault.
So the "hollistic" solutions to this (in OpenSim) are:
- All sockets in components that point to
slave
elements are proactively cleared of pointers before graph traversal - this requires a change in OpenSim because only OpenSim internally knows which components are "adopted", etc. - OR: If graph traversal wants to make a slave, check if it has already been created first, rather than proactively wiping out all slaves and re-creating them (unsure how easy it is to track)
- OR: Stop using pointers in sockets (might be slow)
- OR: Add runtime lifetime checking to sockets, so that stale pointers are caught/reprocessed at runtime, rather than being assumed to always be correct (https://github.com/opensim-org/opensim-core/issues/3533)
Non-"hollistic" solutions include:
- Monkey-patching OSC to include tricks like having its own socket-correcting pass when renaming things in the model
But that comes with its own baggage.
User reported that this problem appears to be fixed for their model, so closing this