skiros2 icon indicating copy to clipboard operation
skiros2 copied to clipboard

Introduce error handling for broken service connections

Open matthias-mayr opened this issue 3 years ago • 0 comments

I am running a lot of tasks with SkiROS and received one of these recently:

1603701395.683933973 INFO /worker_manager [skiros_rl_client.py:215(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 2: Ending task 65451
1603701395.701773881 INFO /worker_manager [skiros_rl_client.py:181(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task

1603701395.706203937 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 32Broken pipe
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 519, in call\n    transport.send_message(request, self.seq)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 673, in send_message\n    self.write_data(self.write_buff.getvalue())\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 698, in write_data\n    raise TransportTerminated(str(ioe_errno)+msg)\n', 'TransportTerminated: 32Broken pipe\n']

1603701395.707558000 ERROR /blackdrops_1603679732000701267 [/tmp/binarydeb/ros-melodic-roscpp-1.14.9/src/libros/service_server_link.cpp:378(ServiceServerLink::call)] [topics: /rosout] Service call failed: service [/skiros_worker_4] responded with an error: error processing request: 32Broken pipe

1603701395.708432912 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 'NoneType' object has no attribute 'close'
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 535, in call\n    transport.close()\n', '  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 846, in close\n    self.socket.close()\n', "AttributeError: 'NoneType' object has no attribute 'close'\n"]
1603701395.709977804 ERROR /blackdrops_1603679732000701267 [/tmp/binarydeb/ros-melodic-roscpp-1.14.9/src/libros/service_server_link.cpp:378(ServiceServerLink::call)] [topics: /rosout] Service call failed: service [/skiros_worker_4] responded with an error: error processing request: 'NoneType' object has no attribute 'close'
1603701395.761034965 INFO /worker_manager [skiros_rl_client.py:181(SkirosRlClient.next_callback)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task

h_robot_1/tick_rate, /bh_robot_2/set_debug] 4: Starting new task
1603701395.771452903 ERROR /worker_manager [tcpros_service.py:581(Service.error_handler)] [topics: /skill_managers/discovery, /bh_robot_4/tick_rate, /bh_robot_4/set_debug, /rosout, /bh_robot_4/monitor, /skill_managers/description, /bh_robot_2/monitor, /bh_robot_3/monitor, /bh_robot_1/set_debug, /bh_robot_1/monitor, /bh_robot_3/tick_rate, /wm/monitor, /bh_robot_3/set_debug, /bh_robot_2/tick_rate, /bh_robot_1/tick_rate, /bh_robot_2/set_debug] Error processing request: 'NoneType' object has no attribute 'tell'
['Traceback (most recent call last):\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 632, in _handle_request\n    response = convert_return_to_response(self.handler(request), self.response_class)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/blackdrops/src/skiros_integration/skiros_rl_client.py", line 193, in next_callback\n    self.task_id = self.agent.tick_once(skill_list=[skill], param_list=params_des)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 113, in tick_once\n    return self.execute(execution_id, skill_list, srvs.SkillCommandRequest().TICK_ONCE, param_list)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 101, in execute\n    res = self.call(self._skill_exe_client, msg)\n', '  
    File "/home/ubuntu/Workspaces/blackdrops_ws/src/skiros2/skiros2/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py", line 199, in call\n    resp1 = service(msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 442, in __call__\n    return self.call(*args, **kwds)\n', '  File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 519, in call\n    transport.send_message(request, self.seq)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/impl/tcpros_base.py", line 672, in send_message\n    serialize_message(self.write_buff, seq, msg)\n', '  
    File "/opt/ros/melodic/lib/python2.7/dist-packages/rospy/msg.py", line 139, in serialize_message\n    start = b.tell()\n', "AttributeError: 'NoneType' object has no attribute 'tell'\n"]

My current workaround is this patch:

--- a/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py                                                                                                                                      
+++ b/skiros2_skill/src/skiros2_skill/ros/skill_manager_interface.py                                                                                                                                      
@@ -84,6 +84,16 @@ class SkillManagerInterface:                                                                                                                                                           
     def get_skill(self, name):                                                                                                                                                                           
         return self._skill_list[name]                                                                                                                                                                    
                                                                                                                                                                                                          
+    # Hack: This is a workaround because the connection breaks occasionally                                                                                                                              
+    def call_execute_client(self, msg):                                                                                                                                                                  
+        try:                                                                                                                                                                                             
+            resp1 = self._skill_exe_client(msg)                                                                                                                                                          
+            return resp1                                                                                                                                                                                 
+        except rospy.ServiceException as e:                                                                                                                                                              
+            log.error("[call]", "Service call failed: %s. Renewing connection."%e)                                                                                                                       
+            self._skill_exe_client = rospy.ServiceProxy(self._skill_mgr_name + '/command', srvs.SkillCommand)                                                                                            
+            return                                                                                                                                                                                       
+                                                                                                                                                                                                         
     def execute(self, execution_id=-1, skill_list=None, action=srvs.SkillCommandRequest().START, param_list=None):                                                                                       
         """                                                                                                                                                                                              
         @brief Execute a list of skills                                                                                                                                                                  
@@ -98,7 +108,7 @@ class SkillManagerInterface:                                                                                                                                                           
         if skill_list is not None:                                                                                                                                                                       
             for s in skill_list:                                                                                                                                                                         
                 msg.skills.append(s.toMsg())                                                                                                                                                             
-        res = self.call(self._skill_exe_client, msg)                                                                                                                                                     
+        res = self.call_execute_client(msg)                                                                                                                                                              
         if res is None:                                                                                                                                                                                  
             return -1                                                                                                                                                                                    
         if not res.ok:   

Since the call function does not easily have the information available to re-instantiate the connection.

matthias-mayr avatar Oct 26 '20 12:10 matthias-mayr