gz-sim icon indicating copy to clipboard operation
gz-sim copied to clipboard

Remove systems if their parent entity is removed

Open arjo129 opened this issue 2 years ago β€’ 13 comments

🦟 Bug fix

Fixes #2217

Summary

In particular if a user despawns an entity, the associated plugin gets removed. This should prevent issues like #2165. TBH I'm not sure if this is the right way forward as a system should technically be able to access any entity in a traditional ECS.

no_crash

To better understand the changes and review this, the PR consists of the following:

  • Add a Drop function to Barrier.cc & Barrier.hh. This is similar to C++20's std::barrier::await_and_drop
  • Add a SystemContainer structure (and tests). This is a masked vector that allows systems to be removed safely while retaining their original addresses.
  • Add a function to SystemManager to facilitate removing systems from a SystemManager Interface (and tests).
  • Update SimulationRunner (and tests) to use the new barrier and removal processing routines.
  • Update PosePublisher (cause I found it crashing with this new PR due to an unrelated optimization).

Checklist

  • [x] Signed all commits for DCO
  • [x] Added tests
  • [ ] Updated documentation (as needed)
  • [ ] Updated migration guide (as needed)
  • [ ] Consider updating Python bindings (if the library has them)
  • [x] codecheck passed (See contributing)
  • [x] All tests passed (See test coverage)
  • [x] While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

πŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”ΈπŸ”Έ

arjo129 avatar Nov 07 '23 09:11 arjo129

Codecov Report

Attention: Patch coverage is 92.55319% with 7 lines in your changes missing coverage. Please review.

Project coverage is 64.78%. Comparing base (0f27ce0) to head (89a0f7d). Report is 119 commits behind head on main.

:exclamation: Current head 89a0f7d differs from pull request most recent head f3c57d0

Please upload reports for the commit f3c57d0 to get more accurate results.

Files Patch % Lines
src/SystemManager.cc 89.83% 6 Missing :warning:
src/systems/pose_publisher/PosePublisher.cc 90.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2232      +/-   ##
==========================================
- Coverage   65.64%   64.78%   -0.87%     
==========================================
  Files         324      358      +34     
  Lines       30937    29213    -1724     
==========================================
- Hits        20308    18925    -1383     
+ Misses      10629    10288     -341     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 08 '23 08:11 codecov[bot]

@arjo129

I thought of this possible alternative implementation:

  • Only the system plugin itself knows which entities are necessary for its functioning,
    • This could e.g. be the model entity it is attached to, but it could as well be any other entity,
    • Hence it's the system plugin's responsibility to identify those entities,
  • To identify these entities, the system plugin adds a component used_by_plugin to each entity it acts upon,
    • A used_by_plugin component of an entity identifies a plugin which acts on that entity,
  • Upon removal of entities, the ECM checks whether they have used_by_plugin components,
    • If so, the SystemManager is triggered to remove the corresponding plugin.

A possible extension could be to also have a kind of severity flag in the component, to allow for different reactions from the SystemManager depending on which entities are removed.

E.g.: consider a system attached to a model and which acts on some subentities of that model:

  • If the model entity is removed, the system should also be removed,
  • But if a subentity is removed, maybe the system should not be removed yet, but rather perform some 'reconfigure' action instead.

jrutgeer avatar Nov 09 '23 08:11 jrutgeer

I like the thought process behind your suggested API addition. However, in my mind that would be an added feature. We need to be able to remove systems safely. That is something that will take a fair deal of work right now and what this PR hopes to (eventually achieve). Your proposed API could be bolted on top in Ionic once we are able to perform safe removal. The current state of this PR is we can safely remove some plugins, but it causes weird breakages in the PosePublisher plugin which I need to investigate prior to making any changes. It is also entirely possible that I'm doing something wrong with the barriers. Either ways I need to revisit this another day when my head is clearer and I have more time.

image

arjo129 avatar Nov 09 '23 09:11 arjo129

I think this works now... Will have to write some unit tests to verify. But it does not go :boom: on the warehouse world.

arjo129 avatar Nov 24 '23 05:11 arjo129

I haven't had a chance to look into this deeply, but one thing to keep is how this interacts with the levels feature. If a model that contains a system is unloaded by the levels feature and gets loaded back again, what would be the behavior? Would the system get configured and start anew? If that works properly, that would cover most cases, but we might also want the ability for a system to stay loaded even when its parent entity is removed so that if the parent entity is loaded back in, the system might continue from where it left off.

azeey avatar Dec 01 '23 05:12 azeey

Thanks for pointing that out. I'll go check out levels and see how this PR interacts with it. A cursory reading of the levels code though makes it seem like we despawn entities and then respawn them which makes me wonder if holding a reference "entity" in a system is correct. I've not looked into this deeply enough to say anything. Will update this PR when I try it. I'm currently out on my anual firefighting reservist.

arjo129 avatar Dec 01 '23 06:12 arjo129

Is any work ongoing in this PR? I'm currently working on a simulation that requires frequent loading and unloading of models. The problem of unloaded plugins really worries me.

Bogdanov-am avatar Mar 04 '24 16:03 Bogdanov-am

I just need to write some tests for this.

arjo129 avatar Mar 06 '24 08:03 arjo129

Tests have been written. I think this is ready for feedback.

arjo129 avatar Mar 18 '24 08:03 arjo129

Also hoping to see this merged soon.

landersson avatar May 01 '24 13:05 landersson

If you'd like to get this checked in faster you could help with reviewing it. One thing to do is to checkout the relevant branch and run it on example worlds to see that things work as expected.

arjo129 avatar May 02 '24 01:05 arjo129

If you'd like to get this checked in faster you could help with reviewing it. One thing to do is to checkout the relevant branch and run it on example worlds to see that things work as expected.

Unfortunately, I don't have any experience with the gz-sim codebase and can't really comment on whether this is a suitable solution or not.

Just need a way to remove models without plugins like JointStatePublisher crashing.

landersson avatar May 02 '24 03:05 landersson

Unfortunately, I don't have any experience with the gz-sim codebase and can't really comment on whether this is a suitable solution or not.

You can try it out and tell me if it crashes! There may be plugins I've not tested on!

arjo129 avatar May 02 '24 04:05 arjo129

I'm now seeing a failure of INTEGRATION_reset_sensors. It's disabled on macOS, so let me try to reproduce on 24.04

scpeters avatar Jul 08 '24 16:07 scpeters

I've reworked and simplified things significantly. We don't use barriers any more. Instead just like during addition, during removal we simply destroy all threads in the PostUpdate thread pool. The end result is a lot less code.

arjo129 avatar Jul 09 '24 08:07 arjo129

Not sure why I'm getting do many dartsim related errors. Can't reproduce locally:

libgz-physics-dartsim-plugin.so: undefined symbol: _ZN3sdf3v158internal17SdfScopeDelimiterB5cxx11Ev

arjo129 avatar Jul 09 '24 08:07 arjo129

Not sure why I'm getting do many dartsim related errors. Can't reproduce locally:

libgz-physics-dartsim-plugin.so: undefined symbol: _ZN3sdf3v158internal17SdfScopeDelimiterB5cxx11Ev

I think that's my fault; I rebuilt the sdformat9 nightlies after removing some deprecated symbols but didn't rebuild gz-physics; I'll take a look

scpeters avatar Jul 09 '24 18:07 scpeters

Thanks @scpeters, I've addressed your feedback. Would like the :heavy_check_mark: from @azeey as well, then we can go ahead and merge this. Thanks for everyone's help on this!

arjo129 avatar Jul 10 '24 02:07 arjo129

See #2473

azeey avatar Jul 11 '24 20:07 azeey

I haven't had a chance to look into this deeply, but one thing to keep is how this interacts with the levels feature. If a model that contains a system is unloaded by the levels feature and gets loaded back again, what would be the behavior? Would the system get configured and start anew? If that works properly, that would cover most cases, but we might also want the ability for a system to stay loaded even when its parent entity is removed so that if the parent entity is loaded back in, the system might continue from where it left off.

I've confirmed that when a level that contains a system get unloaded and loaded back again, the associated systems are also unloaded and loaded properly πŸŽ‰ . Regarding the ability for a system to stay loaded and continue where it left off, I propose we open a feature request and not address it right now since it's probably not going to be widely used.

Since there are a number of bug reports around this issue and since levels will work fine with this, I'm in favor of backporting this to Harmonic and earlier.

azeey avatar Jul 15 '24 21:07 azeey

Thanks everyone for your efforts, I will be making use of this ASAP.

landersson avatar Jul 15 '24 22:07 landersson

Thanks everyone for your guidance! Seems like our homebrew build is blocking the merge.

arjo129 avatar Jul 15 '24 23:07 arjo129

Thanks everyone for your guidance! Seems like our homebrew build is blocking the merge.

I can reproduce the test failures on my macOS machine. I'm investigating, but it's a bit strange.

The test failures may have been introduced by https://github.com/gazebosim/gz-sim/pull/2473 or https://github.com/gazebosim/gz-sim/pull/2475, since macOS CI did not run for either of those pull requests.

scpeters avatar Jul 17 '24 00:07 scpeters

Thanks everyone for your guidance! Seems like our homebrew build is blocking the merge.

I can reproduce the test failures on my macOS machine. I'm investigating, but it's a bit strange.

The test failures may have been introduced by #2473 or #2475, since macOS CI did not run for either of those pull requests.

  • I can reproduce the seg-faults when building normally and running tests with ctest
  • I tried reverting https://github.com/gazebosim/gz-sim/pull/2232/commits/7026b5714f0f88764ad058920f766c07096f0fb7, and it has no effect
  • The tests pass when run with lldb
  • The tests pass when recompiled with -DCMAKE_BUILD_TYPE=Debug

scpeters avatar Jul 17 '24 00:07 scpeters

actually the tests have been failing on main since https://build.osrfoundation.org/job/gz_sim-ci-main-homebrew-amd64/68/

scpeters avatar Jul 17 '24 00:07 scpeters

rerunning homebrew CI now that https://github.com/gazebosim/gz-math/pull/606 and https://github.com/gazebosim/sdformat/pull/1458 have been reverted after seeing that the reverts fixed tests in https://github.com/gazebosim/gz-sim/pull/2482

scpeters avatar Jul 17 '24 16:07 scpeters