ros2_control icon indicating copy to clipboard operation
ros2_control copied to clipboard

Update spawner to have a timeout on service call

Open gennartan opened this issue 1 year ago • 3 comments

In some cases, the service call might never return. As a consequence, the call to spin_until_future_complete will loop infinitely.

When the error happens:

  • The spawner is blocked
  • The controller manager shows the following error: [controller_manager.rclcpp]: failed to send response to /controller_manager/list_controllers (timeout): client will not receive response, at ./src/rmw_response.cpp:154, at ./src/rcl/service.c:402

Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:

  1. Limited scope. Your PR should do one thing or one set of things. Avoid adding “random fixes” to PRs. Put those on separate PRs.
  2. Give your PR a descriptive title. Add a short summary, if required.
  3. Make sure the pipeline is green.
  4. Don’t be afraid to request reviews from maintainers.
  5. New code = new tests. If you are adding new functionality, always make sure to add some tests exercising the code and serving as live documentation of your original intention.

To send us a pull request, please:

  • [ ] Fork the repository.
  • [ ] Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
  • [ ] Ensure local tests pass. (colcon test and pre-commit run (requires you to install pre-commit by pip3 install pre-commit)
  • [ ] Commit to your fork using clear commit messages.
  • [ ] Send a pull request, answering any default questions in the pull request interface.
  • [ ] Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.

gennartan avatar Jun 07 '24 15:06 gennartan

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 86.64%. Comparing base (af4b48f) to head (e25efe8).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1566      +/-   ##
==========================================
- Coverage   86.67%   86.64%   -0.03%     
==========================================
  Files         115      115              
  Lines       10528    10529       +1     
  Branches      967      967              
==========================================
- Hits         9125     9123       -2     
- Misses       1056     1058       +2     
- Partials      347      348       +1     
Flag Coverage Δ
unittests 86.64% <100.00%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../controller_manager/controller_manager_services.py 67.74% <100.00%> (ø)

... and 1 file with indirect coverage changes

codecov[bot] avatar Jun 08 '24 13:06 codecov[bot]

Related: #1501

fmauch avatar Jun 10 '24 06:06 fmauch

Related: #1501

Yes, PR #1501 seems to add the usage of timeout_sec of spin_until_future_complete as I did here. With two main differences:

  • PR #1501 uses the same service_timeout variable to wait that the service is ready and to call the actual service.
  • PR #1501 automatically retries to call the failed service

I don't know what would be the expected behavior from the maintainers. It is up to them to decide which behavior is prefered.

That said, I think merging one of those PR could be a priority since it is a bug in ros2_control that can prevent a robot to start. In my case, I am starting the spawner in a launchfile and in about 30% of the cases, I must manually restart the spawner because the node didn't start correctly. Other projects in my company had the same issues.

gennartan avatar Jun 11 '24 16:06 gennartan

Closing in favor of #1501 (decided on WG Meeting Aug. 14th)

destogl avatar Aug 14 '24 17:08 destogl