ros_control icon indicating copy to clipboard operation
ros_control copied to clipboard

Something is wrong with the controller switching (tests)

Open mathias-luedtke opened this issue 5 years ago • 7 comments
trafficstars

I noticed some spurious failures in the cron CI jobs of industrial_ci.

One test sometimes fails with

Full test results for 'build/controller_manager_tests/test_results/controller_manager_tests/rosunit-controller_manager_scripts_test.xml'

-------------------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="unittest.suite.TestSuite" tests="1" time="2.807"><testcase classname="__main__.TestUtils" name="test_scripts" time="2.8065"><failure type="AssertionError">'my_controller1 - hardware_interface::EffortJointInterface ( stopped )\n' != 'my_controller1 - hardware_interface::EffortJointInterface ( running )\n'
  File "/usr/lib/python2.7/unittest/case.py", line 329, in run
    testMethod()
  File "/root/target_ws/src/ros_control/controller_manager_tests/test/controller_manager_scripts.py", line 59, in test_scripts
    self.assertEqual(run_cm('list'), myc1_running)
  File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
    raise self.failureException(msg)
</failure></testcase><system-out>&lt;![CDATA[
]]&gt;</system-out><system-err>&lt;![CDATA[
]]&gt;</system-err></testsuite>

In the logs further up I found

[31m[ERROR] [1579997360.906338590]: Could not start controller with name my_controller3 because no controller with this name exists[0m
[31m[ERROR] [1579997373.912803741]: Controller manager: Cannot reload controller libraries because there are still 1 controllers running[0m
[31m[ERROR] [1579997373.914896691]: A controller named 'my_controller1' was already loaded inside the controller manager[0m

Traceback (most recent call last):
  File "/root/target_ws/install/lib/controller_manager/spawner", line 207, in <module>
    if __name__ == '__main__': main()
  File "/root/target_ws/install/lib/controller_manager/spawner", line 199, in main
    resp = switch_controller(loaded, [], 2)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 435, in __call__
    return self.call(*args, **kwds)
  File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 525, in call
    raise ServiceException("transport error completing service call: %s"%(str(e)))
rospy.service.ServiceException: transport error completing service call: unable to receive data from sender, check sender's logs for details
[33m[WARN] [1579997381.977107]: Controller Spawner error while taking down controllers: service [/controller_manager/switch_controller] unavailable

I guess that this occurs because we run two test nodes at the same time: https://github.com/ros-controls/ros_control/blob/8d901e019a3bc70333afd69574d887afc244ad2c/controller_manager_tests/test/controller_manager_scripts.test#L14-L15

Furthermore, it looks like the (parallel) switching breaks the controller manager somehow. It is either a bug in our test node or in the switching code or both. #380 might be related.

I will keep monitoring this.

mathias-luedtke avatar Jan 26 '20 01:01 mathias-luedtke

Thanks for the heads up! I'd definitely suggest creating a new .test file with a single node and test against that too.

bmagyar avatar Jan 26 '20 10:01 bmagyar

Is it possible something broke ABI compatibility in ros-control recently? We just did updated our apt packages from the public ROS apt repo on melodic, and controller managers in our code stopped working. After recompile, they worked again. Here's the version we're on now:

iron@m2a:~$ apt-cache policy ros-melodic-controller-manager
ros-melodic-controller-manager:
  Installed: 0.16.0-1bionic.20200130.022304
  Candidate: 0.16.0-1bionic.20200130.022304
  Version table:
 *** 0.16.0-1bionic.20200130.022304 500
        500 http://packages.ros.org/ros/ubuntu bionic/main amd64 Packages
        100 /var/lib/dpkg/status

(timing makes me think it may be the same problem as in this issue)

jonbinney avatar Feb 07 '20 23:02 jonbinney

Is it possible something broke ABI compatibility in ros-control recently?

#395, almost definitely. It added new member functions and new member variables to Actuator, Joint, and Transmission handles. Looking back, that probably should have targetted Noetic...

matthew-reynolds avatar Feb 07 '20 23:02 matthew-reynolds

Indeed, those changes introduced ABI breakage. While the option of targeting Noetic for it was there, we opted for Melodic to deploy this functionality earlier to upstream as well as Noetic is still somewhat unclear and this setup allows us to incrementally improve upon the current solution with another set of ABI-breaking changes when Noetic comes.

@jonbinney Sorry for the inconvenience this caused you, I've just put out an announcement on discourse to let others know: https://discourse.ros.org/t/ros-control-abi-breakage-in-melodic-2020-02/12681

bmagyar avatar Feb 08 '20 17:02 bmagyar

No worries @bmagyar - I'm happy to have the improvements in melodic and the recompilation isn't a problem.

jonbinney avatar Feb 08 '20 21:02 jonbinney

Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.

Thanks for the improvements, as always, onward and upward!

seebq avatar Mar 13 '20 17:03 seebq

How about subclassing? Others do a lot of that with the joint_trajectory_controller. Feel free to propose which functions should be virtual.

On Fri, 13 Mar 2020, 17:21 Charles Brian Quinn, [email protected] wrote:

Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.

Thanks for the improvements, as always, onward and upward!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros-controls/ros_control/issues/418#issuecomment-598830381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA24PYOKME63FKZ5AK2X6D3RHJTSJANCNFSM4KLUDZLA .

bmagyar avatar Mar 13 '20 21:03 bmagyar