studio
studio copied to clipboard
Reconnection to Rosbridge does not reliably re-advertise topics
This comes from a report from a user of my extension, not sure what version they're using but it's fairly recent and on macOS.
My panel has a button that publishes a /bladescript/execute message. The panel is written as a React functional component like so:
function BladescriptPanel(
{ context }: { context: PanelExtensionContext }
): JSX.Element {
...
// Set up rendering for the panel
useLayoutEffect(() => {
...
// Advertise that we publish scripts
context.advertise?.("/bladescript/execute", "std_msgs/String");
}, [context]);
...
}
Occasionally the user gets a complaint from Rosbridge when pressing the button that it was not expecting the publication of the /bladescript/execute message. My theory is that this has to do with when the connection is lost and then the connection gets re-established.
Am I misusing the API here?

We at least try to re-advertise when reconnecting, but maybe something about it is not working properly:
https://github.com/foxglove/studio/blob/788ce4bdd8dba470f4227996bb8061f1ca392c76/packages/studio-base/src/players/RosbridgePlayer.ts#L664-L685
Is there a way we can help debug this, since it's reproducing fairly reliably? We'd be happy to run a build with additional logging. I don't have time myself to wire that up though.
Related issues: https://github.com/RobotWebTools/rosbridge_suite/issues/138 https://github.com/ros/ros_comm/issues/111
You can mitigate this issue by setting a high enough unregister_timeout. The default value is 10 seconds, meaning that the publisher will be unregistered after 10 seconds when there are no more clients publishing to this topic.
You can set the unregister_timeout e.g. as a launch file argument (assuming ros1):
roslaunch rosbridge_server rosbridge_websocket.launch unregister_timeout:=3600
Also see https://github.com/RobotWebTools/rosbridge_suite/pull/322 where the timeout was made configurable.
Thanks @achim-k. I don't fully understand the issue, though. Since you are familiar with it, would you please summarize it? In particular, I don't understand what it means to unregister a publisher, or who is unregistering it from where, and why. What problem does the timeout solve, rather than unregistering immediately? What is the consequence of not unregistering?
In order for rosbridge clients to publish to ROS topics, the rosbridge server has to internally create a ROS publisher which is used to publish to the actual ROS topic. This is done when a client sends an advertise operation to the rosbridge server. Future messages sent from the rosbridge client (via the publish operation) will be published to the ROS topic using the previously created ROS publisher. Note that only a single ROS publisher is created per topic. E.g. when there are more then one rosbridge clients publishing to the same topic, the same ROS publisher is used on the server side.
After all clients that previously published to this topic disconnected (or called the unadvertise operation), the internal ROS publisher is unregistered after a certain timeout (unregister_timeout). This leads to the warning messages you mentioned above being logged, which is a consequence of https://github.com/ros/ros_comm/issues/111. So in order to mitigate this, your best option is to avoid unregistering of internal ROS publishers which can be done by setting the unregister_timeout to a high enough value.
@achim-k Is there something Studio can do to re-establish the publisher when it is connected? Maybe it already does this and the warning shown on rosbridge is separate from that happening? The bug report here describes a situation where Studio has re-connected to rosbridge but not able to publish. Even with a 10 second timeout (the default) for publisher teardown we should be able to re-establish the publisher from Studio and rosbridge would publish the message.
From the end-user perspective (If I've read the issue correctly), we are in a situation where Studio has re-connected and everything looks ok but publishing doesn't work because we've not handled this in a way to get rosbridge to re-establish the publisher.
Maybe it already does this and the warning shown on rosbridge is separate from that happening?
From Jacob's comment it looks like that Studio does everything right. But I will investigate if this is really the case.
Thanks for investigating and for the explanation.
After all clients that previously published to this topic disconnected[,] the internal ROS publisher is unregistered after a certain timeout (
unregister_timeout).
I assume the timeout gives the chance for the rosbridge client-publisher to return within a brief window after quitting (the page refresh scenario discussed in https://github.com/RobotWebTools/rosbridge_suite/issues/138), without any apparent change to the ROS topic graph that causes a subscriber to think the topic disappeared.
Then, if I understand these bugs correctly, the rosbridge has said "my last /foo client-publisher is gone, and it's been a few seconds and they never came back, I am going to cease publishing /foo". But something gets out of sync: some other node thinks rosbridge is still publishing /foo but when it actually makes the socket connection to rosbridge and asks for /foo, that's not on the internal list of published topics.
If we apply the workaround to use a long timeout, then we keep the external and internal view of what topics are published in sync by never allowing rosbridge to really unregister the topic.
I assume the timeout gives the chance for the rosbridge client-publisher to return within a brief window after quitting (the page refresh scenario discussed in RobotWebTools/rosbridge_suite#138), without any apparent change to the ROS topic graph that causes a subscriber to think the topic disappeared.
Correct.
Then, if I understand these bugs correctly, the rosbridge has said "my last
/fooclient-publisher is gone, and it's been a few seconds and they never came back, I am going to cease publishing/foo". But something gets out of sync: some other node thinks rosbridge is still publishing/foobut when it actually makes the socket connection to rosbridge and asks for/foo, that's not on the internal list of published topics.
This is also how I understand it.
If we apply the workaround to use a long timeout, then we keep the external and internal view of what topics are published in sync by never allowing rosbridge to really unregister the topic.
Correct. I see no better alternative at the moment as https://github.com/ros/ros_comm/issues/111 is unlikely to get fixed in the future. There should be no issues keeping the ROS publishers instances alive, even when your rosbridge clients create 100+ publishers.
We at least try to re-advertise when reconnecting, but maybe something about it is not working properly:
https://github.com/foxglove/studio/blob/788ce4bdd8dba470f4227996bb8061f1ca392c76/packages/studio-base/src/players/RosbridgePlayer.ts#L664-L685
We do however not call Roslib.Topic.advertise() so the topics are only re-advertised when a message is being published. I will submit a patch that actively calls advertise().
Edit: When doing that, I noticed some undesired behavior: We sent way too many advertise operations e.g. while setting up the publish panel configuration (see screencast). @jtbandes do you have an idea on how to solve this? Maybe adding some debounce to the publish panel?
We could add a debounce if you like. But it also seems fine to send all these advertise/unadvertise operations — is it causing some problem?
It is not a big problem, but these messages cause unnecessary changes to the ros graph so it would be good if we could avoid them.
In #4653, I parameterized the _setupPublishers method. This avoids the unnecessary websocket messages and also doesn't require adding a debounce.