ros2_control icon indicating copy to clipboard operation
ros2_control copied to clipboard

Robustify spawner

Open fmauch opened this issue 1 year ago • 1 comments

With the current implementation it can happen that when using multiple controller spawners some of them fail or get stuck, see #1182. This fixes #1182.

While writing this, I realize this has great overlap with #1483, but I see no problem combining those two.

As briefly mentioned above, this change addresses two things:

  • The node discovery mechanism for the controller manager seems to be error-prone and not strictly needed.
  • It can happen that we don't get a response from the service server and hang in a deadlock.

Note: The test I implemented is a bit hacky, so we might also want to remove it again? Otherwise I think we would have to change the following things:

  • I installed the test folder in order to access the controller configuration file living in there. We should either install that file by hand rather than the complete test folder or move it somewhere else.
  • I've added a urdf file in ros_control_test_assets as I thought that might be the most useful place. I might be wrong.
  • I've added a sleep to my test before checking whether the controller_manager shows up all the expected controllers. That worked for me for an implementation, but since a timeout is very error-prone it would be better to have a proper waiting mechanism. With the changes I made the spawners should die eventually and not hang in a deadlock, so we could add event handlers to the launch description, but I'm not sure how to combine the exit events from all the spawners into one trigger.

I implemented and tested things on the rolling on jammy installation I currently have, but I know the problem definitively also arises for humble users.

fmauch avatar Apr 23 '24 10:04 fmauch

I'll have to have a look at the tests anyway, since they seem not to work on noble... I'll probably wait to properly address this until Friday when noble is released.

Thank you for the input @christophfroehlich. Moving both files to the test_assets didn't seem right to me, since the controller config file is a bit awkward (I mean, I spawn 50 joint_state_controllers for the same joints). I tried including things into the launchfile, but since the parameters aren't for the ros2_control_node this doesn't work quite straightforward. But maybe there is a way. However, I am currently not sure whether I have to install the python test, anyway, so it would also not completely harm to install the controller file, I think.

It looks like #1483 might get merged before this, though, in which case most of the changes here aren't necessarily required anymore and maybe also the test is questionable. We might hold this until #1483 is merged and it is decided how to proceed with humble and iron.

fmauch avatar Apr 23 '24 20:04 fmauch

I'll update this, now that #1562 is merged.

fmauch avatar Aug 14 '24 17:08 fmauch

Codecov Report

Attention: Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.64%. Comparing base (af4b48f) to head (6d627b5). Report is 2 commits behind head on master.

Files Patch % Lines
.../controller_manager/controller_manager_services.py 55.55% 2 Missing and 2 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1501      +/-   ##
==========================================
- Coverage   86.67%   86.64%   -0.03%     
==========================================
  Files         115      115              
  Lines       10528    10544      +16     
  Branches      967      970       +3     
==========================================
+ Hits         9125     9136      +11     
- Misses       1056     1059       +3     
- Partials      347      349       +2     
Flag Coverage Δ
unittests 86.64% <80.95%> (-0.03%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
controller_manager/test/test_spawner_unspawner.cpp 99.05% <100.00%> (+0.05%) :arrow_up:
.../controller_manager/controller_manager_services.py 66.15% <55.55%> (-1.59%) :arrow_down:

... and 1 file with indirect coverage changes

codecov[bot] avatar Aug 15 '24 08:08 codecov[bot]

I'm happy with how that currently is, I'm just not sure about the docstring style. I don't have a lot of experience regarding python docstrings, but I do prefer using type hints in the signature and restructured text in the docstring:

def service_caller(                                                                                                                                                                                                                   
     node: 'Node',                                                                                                                                                                                                                     
     service_name: str,                                                                                                                                                                                                                
     service_type: str,                                                                                                                                                                                                                
     request: SrvTypeRequest,                                                                                                                                                                                                          
     service_timeout: Optional[float] = 0.0,                                                                                                                                                                                           
     call_timeout: Optional[float] = 10.0,                                                                                                                                                                                             
     max_attempts: Optional[int] = 3,                                                                                                                                                                                                  
   -> SrvTypeResponse:                                                                                                                                                                                                                 
     """                                                                                                                                                                                                                               
     Abstraction of a service call.                                                                                                                                                                                                    
                                                                                                                                                                                                                                       
     Has an optional timeout to find the service and a mechanism                                                                                                                                                                       
     to retry a call of no response is received.                                                                                                                                                                                       
                                                                                                                                                                                                                                       
     :param node: Node object to be associated with                                                                                                                                                                                    
     :param service_name: Service URL                                                                                                                                                                                                  
     :param request: The request to be sent                                                                                                                                                                                            
     :param service_timeout: Timeout (in seconds) to wait until the service is available. 0 means                                                                                                                                      
         waiting forever, retrying every 10 seconds.                                                                                                                                                                                   
     :param call_timeout: Timeout (in seconds) for getting a response                                                                                                                                                                  
     :param max_attempts: Number of attempts until a valid response is received. With some                                                                                                                                             
         middlewares it can happen, that the service response doesn't reach the client                                                                                                                                                 
         leaving it in a waiting state forever.                                                                                                                                                                                        
     :return: The service response                                                                                                                                                                                                     
                                                                                                                                                                                                                                       
     """ 

As there aren't too many python docstrings in this project and I couldn't find any specification on what to use, I would like to ask, whether there is a preference here.

fmauch avatar Aug 15 '24 08:08 fmauch

As there aren't too many python docstrings in this project and I couldn't find any specification on what to use, I would like to ask, whether there is a preference here.

This is great! Now we have a reference :)

destogl avatar Aug 15 '24 10:08 destogl