ros_control
ros_control copied to clipboard
Something is wrong with the controller switching (tests)
I noticed some spurious failures in the cron CI jobs of industrial_ci.
One test sometimes fails with
Full test results for 'build/controller_manager_tests/test_results/controller_manager_tests/rosunit-controller_manager_scripts_test.xml'
-------------------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<testsuite errors="0" failures="1" name="unittest.suite.TestSuite" tests="1" time="2.807"><testcase classname="__main__.TestUtils" name="test_scripts" time="2.8065"><failure type="AssertionError">'my_controller1 - hardware_interface::EffortJointInterface ( stopped )\n' != 'my_controller1 - hardware_interface::EffortJointInterface ( running )\n'
File "/usr/lib/python2.7/unittest/case.py", line 329, in run
testMethod()
File "/root/target_ws/src/ros_control/controller_manager_tests/test/controller_manager_scripts.py", line 59, in test_scripts
self.assertEqual(run_cm('list'), myc1_running)
File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
assertion_func(first, second, msg=msg)
File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
raise self.failureException(msg)
</failure></testcase><system-out><![CDATA[
]]></system-out><system-err><![CDATA[
]]></system-err></testsuite>
In the logs further up I found
[31m[ERROR] [1579997360.906338590]: Could not start controller with name my_controller3 because no controller with this name exists[0m
[31m[ERROR] [1579997373.912803741]: Controller manager: Cannot reload controller libraries because there are still 1 controllers running[0m
[31m[ERROR] [1579997373.914896691]: A controller named 'my_controller1' was already loaded inside the controller manager[0m
Traceback (most recent call last):
File "/root/target_ws/install/lib/controller_manager/spawner", line 207, in <module>
if __name__ == '__main__': main()
File "/root/target_ws/install/lib/controller_manager/spawner", line 199, in main
resp = switch_controller(loaded, [], 2)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 435, in __call__
return self.call(*args, **kwds)
File "/opt/ros/kinetic/lib/python2.7/dist-packages/rospy/impl/tcpros_service.py", line 525, in call
raise ServiceException("transport error completing service call: %s"%(str(e)))
rospy.service.ServiceException: transport error completing service call: unable to receive data from sender, check sender's logs for details
[33m[WARN] [1579997381.977107]: Controller Spawner error while taking down controllers: service [/controller_manager/switch_controller] unavailable
I guess that this occurs because we run two test nodes at the same time: https://github.com/ros-controls/ros_control/blob/8d901e019a3bc70333afd69574d887afc244ad2c/controller_manager_tests/test/controller_manager_scripts.test#L14-L15
Furthermore, it looks like the (parallel) switching breaks the controller manager somehow. It is either a bug in our test node or in the switching code or both. #380 might be related.
I will keep monitoring this.
Thanks for the heads up! I'd definitely suggest creating a new .test file with a single node and test against that too.
Is it possible something broke ABI compatibility in ros-control recently? We just did updated our apt packages from the public ROS apt repo on melodic, and controller managers in our code stopped working. After recompile, they worked again. Here's the version we're on now:
iron@m2a:~$ apt-cache policy ros-melodic-controller-manager
ros-melodic-controller-manager:
Installed: 0.16.0-1bionic.20200130.022304
Candidate: 0.16.0-1bionic.20200130.022304
Version table:
*** 0.16.0-1bionic.20200130.022304 500
500 http://packages.ros.org/ros/ubuntu bionic/main amd64 Packages
100 /var/lib/dpkg/status
(timing makes me think it may be the same problem as in this issue)
Is it possible something broke ABI compatibility in ros-control recently?
#395, almost definitely. It added new member functions and new member variables to Actuator, Joint, and Transmission handles. Looking back, that probably should have targetted Noetic...
Indeed, those changes introduced ABI breakage. While the option of targeting Noetic for it was there, we opted for Melodic to deploy this functionality earlier to upstream as well as Noetic is still somewhat unclear and this setup allows us to incrementally improve upon the current solution with another set of ABI-breaking changes when Noetic comes.
@jonbinney Sorry for the inconvenience this caused you, I've just put out an announcement on discourse to let others know: https://discourse.ros.org/t/ros-control-abi-breakage-in-melodic-2020-02/12681
No worries @bmagyar - I'm happy to have the improvements in melodic and the recompilation isn't a problem.
Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.
Thanks for the improvements, as always, onward and upward!
How about subclassing? Others do a lot of that with the joint_trajectory_controller. Feel free to propose which functions should be virtual.
On Fri, 13 Mar 2020, 17:21 Charles Brian Quinn, [email protected] wrote:
Oh man, this bit us, because (gulp) we had a forked version of diff_drive_controller. I know we shouldn't have, but are doing some custom stuff that shouldn't be upstream. We build our own packages and had to version bump to rebuild and deploy to field units. Doh.
Thanks for the improvements, as always, onward and upward!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ros-controls/ros_control/issues/418#issuecomment-598830381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA24PYOKME63FKZ5AK2X6D3RHJTSJANCNFSM4KLUDZLA .