Summary

I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.

Versions

ROS Driver version: Melodic with Ubuntu 18.04
Affected Robot Software Version(s): URSoftware 5.9.1.1031110
Affected Robot Hardware Version(s): UR10e
URCaps Software version(s): External Control 1.0.5

Impact

Due to this problem, it is not possible to smoothly use external control for guidance tasks based on visual markers.

Issue details

I have made a ROS program that allows me to send the position of an ArUco marker to the robot, so that it can be placed just above it at a defined distance (in Z). To do this, two nodes are used: One node for the vision system and the transmission of the ArUco’s pose and another node to receive this information and execute the robot’s movement. Everything works correctly, I mean, the robot positions itself satisfactorily until suddenly (I don’t know the cause) the robot stops accepting new poses and the connection is closed. If I run again the external control from the teachpedant, the program still works.

The controller used is the “fordward_cartesian_traj_controller”. The driver has been downloaded from : https://github.com/UniversalRobots/Universal_Robots_ROS_Driver

In addition, the message that appears on the console is attached in the following image: MicrosoftTeams-image

Extra details

In case you require further information, please do not hesitate to ask for it.

Thanks in advanced!

Jan 31 '22 09:01 DanielBilbao12

First of all, thank you for posting this.

Basically, what I think is happening here:

The control node running on the ROS machine (or the communication between the two machines) is not fast / reliable enough. This is why we suggest using a ROS machine with a real-time patched kernel and a dedicated network connection. I can't see whether you've got that kind of setup running.
When the robot does not receive a command from the ROS machine every control cycle, it will end the program (hence, your robot stops)
From the screenshot you posted (preferably post output using syntax hoghlighting instead of screenshots) it looks like your Teach-Pendant program tries to reconnect automatically, but that happens too fast for the current implementation. (That's a bug known to me, but I haven't found the time to address it, yet). A workaround for that would be to add a sleep command at the end of your TP-Program (after the external_control node). Then, your robot will automatically reconnect to your ROS machine. This will however still stop your robot, unfortunately.
As a workaround (I do see the problem here and this should be fixed properly), you could try modifying the driver by setting a higher keepalive counter. This can be done by adding ur_driver_->setKeepaliveCount(5); after this line. You can change the number 5 to anything you like. The higher the number, the more control cycles without any feedback from the ROS machine the robot will allow. Note, that this is an assumption that this workaround should work, so please report back in any case.

I hope, this explanation helps and we can address this properly, soon.

Feb 02 '22 16:02 fmauch

As mentioned on the forum, I also hit this issue when running the driver in a VM. On my side 20.04 and Noetic over a gigabit wired switch, no chance to get a stable connection in a VM (Parallels on Mac), but runs for days with zero hiccups on a real linux machine. I have not inquired further and suspected that it might be something to do with the VM not being responsive enough compared to native hardware, but this prevents me from developing with my laptop while travelling which is quite inconvenient, since I would ideally be able to run ROS in a VM alongside URSim, which is at the moment not possible (at least on my config).

I am not sure what is the error about the "maximum number of clients (1) already connected", could this point rather more to an issue with the network driver of the VM environment?

I have no time to dig further until the end of next week but will happily run more tests after to see if I can isolate the issue to something more specific.

Feb 02 '22 21:02 aatb-ch

Maybe a note, our robots are UR10 with CB3.0 running 3.15, the linux machines dont have a RT kernel.

Feb 02 '22 21:02 aatb-ch

Hello again!

First of all, thank you for your quick response!

Regarding the first point, I am using a virtual machine with ROS Melodic and Ubuntu 18.04. The connection between the robot controller and the PC is done via wired connection and the real time kernel, it is not enabled.
Regarding the solution proposed in the second point, by introducing a wait sentence of 0.5 s after the external_control node, the connection is re-established without the need to re-execute the program from the TP. The problem, as you say, is that in this way the robot stops (although it reconnects quickly).
Finally, we have tried to solve the problem by modifying the driver by introducing the sentence ur_driver_->setKeepaliveCount(5); in the indicated place. We have tried changing the number 5 to higher numbers but the problem persists.

I think the problem is related to the number of poses being sent from the vision system node to the control system node because when the frequency of frames generated by the vision system increases, the connection drops earlier. Could it be because of the topic queue? I have set this to be 1 so that the pose sent by this node is the last one detected by the vision system and so the robot only moves to the final position and not to all the positions that the aruco passes through before being moved through the workspace.

Thanks in advanced, Daniel.

Feb 03 '22 10:02 DanielBilbao12

I would expect the problem to correlate with the system load actually going on on the system. I'll be in the lab tomorrow where I will have a look at a quick fix I did last year to a similar problem. I thought, I did the setKeepaliveCount thing, but I might remember it wrong.

Feb 03 '22 11:02 fmauch

I have been doing more testing, we are currently deploying 2 UR10s, which we ideally want to drive off a single Jetson Nano for ease of deloyment, instead of having multiple machines.

I currently am able to run 2 nodes of the ur_robot_driver from the Jetson, and it seems to work well, with both processes hovering around 15% cpu load, but I can still see the occasional dropped connection, about once per hour:

[ INFO] [1644150281.232665034]: Robot requested program
[ INFO] [1644150281.232870502]: Sent program to robot
[ INFO] [1644150281.404626379]: Robot connected to reverse interface. Ready to receive control commands.
[ INFO] [1644150429.325002328]: Connection to reverse interface dropped.
[ERROR] [1644150429.325689976]: Sending data through socket failed.
[ INFO] [1644150430.317166557]: Robot connected to reverse interface. Ready to receive control commands.
[ERROR] [1644151231.750051161]: Unexpected error: No trajectory defined at current time. Please contact the package maintainer.
[ INFO] [1644151490.761816208]: Connection to reverse interface dropped.
[ INFO] [1644151491.754148939]: Robot connected to reverse interface. Ready to receive control commands.

I have a hunch that this is indeed system load-related, if I am doing other things like installing a package or loading a chrome tab i can see it happening more often. What is the setKeepaliveCount trick you are mentionning? For my use-case I can allow a pretty loose timeout since what I care about most is stability over time.

Edit: might be the obvious thinking but, I have avoided having to mess with RT kernels up till now, wouldn't this be the solution? I have little time left to fiddle with this but if there is a dramatic difference in behavior with/without a RT kernel I might put the time needed to upgrade the machine.

Feb 06 '22 14:02 aatb-ch

Quick note, I went ahead and compiled a RT kernel for the Jetson Nano and have not seen any dropped connection since then, so it very likely is due to occasional jitter which just gets worse when system load goes up.

I still have these Unexpected error: No trajectory defined at current time. Please contact the package maintainer. messages recurring but these are likely an unrelated issue, though I'd prefer to figure out what is causing them (I am publishing trajectory messages from Rosbridge websocket server to the Topic interface at about 10Hz).

Feb 06 '22 18:02 aatb-ch

Just wanted to add some further notes: running 20.04 with RT kernel on a Raspberry Pi 4, but connected to URSim is also very stable with only very occasional dropped connection, at least rare enough to be useable. So the issue is really about running the ur_robot_driver node in a VM.

Mar 07 '22 18:03 aatb-ch

Hi,

I'm facing the exact same problem with UR3e right now. Has anyone found a good fix for this? I just pulled all the latest changes from the UR ros driver package and updated the polyscope on the teach pendant to the most recent version.

I installed all the dependencies correctly using rosdep install -r --from-paths . --ignore-src --rosdistro $ROS_DISTRO -y -i --verbose, did an update and upgrade after, recompiled my ws and rebooted both the robot and the PC.

I haven't faced this issue till now and it suddenly started happening today. I'm not sure what changed. I checked and made sure the robot is connected to the network on the teach pendant and my ros driver is connecting to the correct ip address.

Kindly help! @fmauch

Sep 20 '22 22:09 prasuchit

It might not be your case, but I had forgotten to update the Universal_Robots_Client_Library and had systematic connection drops. Was resolved after the update.

Sep 30 '22 20:09 captain-yoshi

As a workaround (I do see the problem here and this should be fixed properly), you could try modifying the driver by setting a higher keepalive counter. This can be done by adding ur_driver_->setKeepaliveCount(5); after this line

Since the file seems to have changed since this comment, could you please update the link or share after which line the setKeepAliveCount is supposed to be set @fmauch ?

Or is there a new workaround for this issue ?

Oct 14 '22 10:10 niomate

Sorry I didn't use a permalink. You can insert that code after this line: https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/222d67aad0bfb4f35e7acc32d34b4fb74710c0bd/ur_robot_driver/src/hardware_interface.cpp#L314-L315

Oct 14 '22 14:10 fmauch

Thanks, much appreciated !

Oct 18 '22 08:10 niomate

And where would this line be added for the ROS 2 drivers @fmauch ? I have the same problem with Foxy

Nov 04 '22 14:11 alexandrosnic

@alexandrosnic Please see https://github.com/UniversalRobots/Universal_Robots_ROS2_Driver/issues/534

Nov 09 '22 05:11 fmauch

Sorry I didn't use a permalink. You can insert that code after this line:

https://github.com/UniversalRobots/Universal_Robots_ROS_Driver/blob/222d67aad0bfb4f35e7acc32d34b4fb74710c0bd/ur_robot_driver/src/hardware_interface.cpp#L314-L315

It makes UR Simulator (Vmware / Physical Machine) work well ! I suggest adding a parameter like simulator: =True or alive_count: =5 for the ros driver launch file.

Nov 16 '22 09:11 jsbyysheng

@jsbyysheng A more descriptive name of what the parameter actually does, such as max_missed_packages or something like that would maybe be more suitable. The keepalive_count IMHO makes sense internally, but for driver users, this probably isn't self explanatory.

However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.

Nov 16 '22 12:11 fmauch

@jsbyysheng A more descriptive name of what the parameter actually does, such as max_missed_packages or something like that would maybe be more suitable. The keepalive_count IMHO makes sense internally, but for driver users, this probably isn't self explanatory.

However, when we introduce such a parameter, this would have to be well-documented so users are aware of the implications following from that.

In fact, this parameter makes more sense for URSim than a real robot. It is not necessary for URSim to keep RT communication. A simple launch parameter named using_ursim:=true with ur_driver_->setKeepaliveCount(5); fixed to 5 is totally enough. A count number of more than 5 may make no sense because the network environment is unstable for any communication.

Nov 16 '22 17:11 jsbyysheng

Hi All,

Since this is one of the first results that came up when I personally encountered this issue when I was trying to control a UR5e with ROS for my undergraduate thesis, I thought I would bring up my solution to the problem.

The previous answers helped me greatly in understanding the root cause of the issue regarding the network stability. The software solutions provided are excellent, but I wanted to see if a hardware solution could be reached.

I'll preface this by saying that I am taking no responsibility for the consequences of you following these steps, I am in no way an authority on what I am talking about and I'm purely talking about my personal experience. This is not technical advice.

For reference, I am using VMware Workstation 17 with Ubuntu 20.04 as my guest host, but I believe the solution steps I took should work in versions of these published in the last 10 or so years. that I was bridging my ethernet connection to the UR5e directly to my virtual machine, but even that did not provide enough network stability to prevent me from losing my connection every ~30 seconds or so.

I looked at this article by vmware talking about some possible causes of network issues.
I realized my virtual network adapters were running e1000 instead of vmxnet3, which is supposed to improve performance a lot. This is not doable by GUI on Workstation, so I had to edit the .vmx file and replace the lines where it says e1000 with vmxnet3. Prudent to make a backup of the vmx before this.
I lost my network gui on my guest! Then I got it back by running: sudo nmcli networking off sudo nmcli networking on

Some further reading about adding the network interface back in to the /etc/networks/interface file might be helpful if this does not fix the issue. 4. I increased the resources allocated to the VM from 4GB ram/2 (core * processors) to 6GB ram/4 (core * processors) (not sure if this helped all that much, but my setup can afford it). 5. I disabled side channel mitigation (Which apparently opened my computer up to some potentially very nasty attacks, worth doing research over before doing).

All these steps took me from a place of dropping connection ever 30 seconds to no connections dropped yet. (maybe still possible, I haven't kept it running for over 1 hour at a time but it looks promising). I suspect the most impactful step was to changing the NIC drivers. I also suspect this combined with the software solution enabling auto-reconnecting in case of a DC as insurance would be sufficient for most use cases.

I hope this is helpful to someone!

Feb 22 '23 03:02 OltanS

one of the problem might be the hardware which is used to connect the system.

Jan 12 '24 09:01 JaisonJose241

Universal_Robots_ROS_Driver
Universal_Robots_ROS_Driver copied to clipboard

Connection to reverse interface dropped - UR10e

Summary

Versions

Impact

Issue details

Extra details

Universal_Robots_ROS_Driver Universal_Robots_ROS_Driver copied to clipboard

Connection to reverse interface dropped - UR10e

Summary

Versions

Impact

Issue details

Extra details

Universal_Robots_ROS_Driver
Universal_Robots_ROS_Driver copied to clipboard