Universal_Robots_ROS2_Driver Latency in /joint_states topic without real time kernel?

I'm currently porting a code base over from ROS 1 (which used the ROS 1 version of the driver). I have a node that generates joint angles, and another node that runs a PID controller and converts joint angles to joint velocities, which are sent to the robot via URScript over a socket (speedj command). I'm using a general purpose Ubuntu machine, so I don't have the real time kernel installed.

In ROS 1, this setup worked fine - despite not having the real time kernel, messages to the /joint_states topic seemed to arrive with little enough latency to make the PID controller (which runs at the ~120Hz rate of these messages) effective. In ROS 2, the robot motion is very jerky for small / slow motions, and for any other motions it oscillates and diverges until I have to issue an emergency stop. I'm not certain that this is due to latency in /joint_states, but it is consistent with what we would expect from higher latency, and I have ruled out many other potential sources.

Is it a known issue that there is more latency in the /joint_states topic in ROS 2 than in ROS 1, or is there a mechanism to explain why this could be the case? #498 indicates to me that there is no known reason this would be the case.

Any ideas on ways to circumvent the problem are also welcome.

Thank you.

Jul 24 '23 18:07 cpsiff

I am not aware of any delay being relevant there, but I've personally encountered "issues" around ROS 2 communication depending on the specific setup. However, the use case that you describe sounds like it should be implemented as a ros2_control controller rather than a separate node that uses topic communication. This way, you will be able to generate your motion commands in the same control cycle as you read the current joint values. Implementing a custom controller isn't too hard and might definitively be worth it in terms of reliability.

Jul 25 '23 12:07 fmauch

Thanks for the response @fmauch.

Can you explain what you mean by "That way, you will be able to generate your motion commands in the same control cycle as you read the current joint values"? Our current controller runs at the refresh rate of /joint_angles (~120Hz), and every cycle it reads in joint angles and generates a motion command. How would a custom ros2_control controller improve upon this?

Also, do you have any ideas what changes to the setup might improve ROS 2 communication? Might switching to a realtime kernel help? Or is that irrelevant to this issue? (I don't completely understand what effects the realtime kernel has).

Jul 25 '23 14:07 cpsiff

When you run your controller in a separate node, you have ROS communication in between which builds up latency unless specific measures taken.

If you implement your controller in a ros2_control controller it is running in the same process (even the same thread) where the joint values are actually read from the hardrare. With this latencies can be expected to be close to 0. And yes, in that case using a real-time (or also a lowlatency) kernel might improve your experience.

What I meant is basically this: If you have it running in a separate node the following will happen:

There is a control loop running inside the driver. This is running with the control frequency configured. In a control cycle it reads values from the hardware, updates controllers and writes back values to the hardware.
The controller update (in case of the joint state broadcaster) copies the values read from the hardware and notifies a publisher to publish those values. The publisher itself is running in another thread, the latency between reading from the hardware and the data being published is therefore not necessarily deterministic.
Your controller subscribes on that published message, generates a control command and publishes that back.
This published message is read by the ros2_controller and stored in a buffer for later reference (in case of the forwarding controller)
In the next control cycle after that buffer has been written the controller reads that buffer and sends the command to the robot.

As you can see in the dataflow above there are two undeterministic communications involved.

When implementing this as a ros2_control controller the dataflow would be as follows:

There is a control loop running inside the driver. This is running with the control frequency configured. In a control cycle it reads values from the hardware, updates controllers and writes back values to the hardware.
During this controller update your controller will generate motion commands and write it back to the hardware directly (in fact it writes it to a buffer which will be sent to the hardware right after the update step, but for simplicity we can view this as writing directly to the hardware). No threads / ROS communication involved.

Aug 02 '23 15:08 fmauch

That is very helpful, thank you. I'll try writing my own ROS 2 controller, and maybe try switching to a real-time kernel. I will report back on how it goes.

Aug 04 '23 21:08 cpsiff

I was able to fix the problem by getting the joint states directly from a socket connection to the robot in my PID loop node. It seems like there is some delay introduced by the UR Robot Driver publishing to the /joint_states topic which was not present in ROS 1. I did not try a real-time kernel or implementing a ros2_control controller.

If anybody is having a similar problem, I'm happy to share my code for getting the joint states directly from the robot.

It would be nice if this can be fixed in the driver in later versions, but I understand it might just be some latency that's intrinsic to ROS 2, not something that the driver can address.

Aug 16 '23 18:08 cpsiff

Hi @cpsiff! I am trying to setup the teleoperation using moveit servo. For small movements i get a strange behaviour (the lift shoulder joint has jerky movements) which i don't get by directly commanding the robot with the forward_position controller. Could the problem be related to what you were experiencing with the joint_statas delay? Would you be able to share your solution?

Jan 26 '24 15:01 alesof

I'm not sure if it's related. It's possible that MoveIt relies on the ROS2 topics to get joint velocities, but I'd have to look into the MoveIt code to be sure.

My solution is to get joint velocities directly from the robot, via socket. Here is the function that I've written to do that. It is called on a 0.008s timer.

def timer_cb(self):
# get the current joint angles from a socket connection to the robot and store them as a
# list in the variable "positions"
recv = self.joint_sock.recv(1220)
while len(recv) != 1220:
    self.get_logger().info(f"Received {len(recv)} bytes, expected 1220 bytes. Trying again...")
    self.joint_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    self.joint_sock.connect((self.robot_ip, 30003))
    recv = self.joint_sock.recv(1220)
recv = recv[:1060] # everything after 1060 is always 0? and I don't know how to decode it
# https://docs.python.org/3/library/struct.html
fmt = "!idddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd"
line = struct.unpack(fmt, recv)
positions = list(line[32:38])
# self.get_logger().info(f"Received joint positions: {positions}")

self.joint_position_filter.next(positions)

self.current_joints = self.joint_position_filter.get()
# self.current_joints = positions # DEBUGGING

if len(self.current_joints) == 0:
    return 

if self.pause_speedj:
    return

q = np.array(self.joint_goal_filter.get()) # goal pose

if len(q) == 0:
    return 
p_error = (q - self.current_joints)

self.i_error += self.dt * p_error
self.i_error = np.clip(self.i_error, -1, 1)

if self.last_p_error is not None:
    d_error = (p_error - self.last_p_error) / self.dt
else:
    d_error = 0
    self.last_p_error = [0.0] * 6 

p_term = p_error * self.p_gain
i_term = self.i_error * self.i_gain
d_term = d_error * self.d_gain

for i in range(6):
    self.last_p_error[i] = p_error[i]

v = p_term + i_term + d_term

if np.max(np.abs(v)) > 11.0:
    v = v / np.max(np.abs(v)) * 11.0

a = np.max(np.abs(v)) * 1
command = "speedj([{0},{1},{2},{3},{4},{5}],a={6}, t={7})".format(str(
    v[0]), str(v[1]), str(v[2]), str(v[3]), str(v[4]), str(v[5]), a, 0.3) + "\n"

self.sock.send(command.encode('utf8'))

self.last_joint_state = positions

Jan 26 '24 16:01 cpsiff

I think this issue has been discussed to a point where it can be closed.

Apr 10 '24 08:04 fmauch