common_msgs icon indicating copy to clipboard operation
common_msgs copied to clipboard

Add diagnostic_msgs/Heartbeat message

Open peci1 opened this issue 3 years ago • 5 comments

Deep analysis of the current state in ROS 1 along with discussion can be found here: https://discourse.ros.org/t/add-heartbeat-message-type/24162 .

A short summary for why this message should be added:

The usage I’m most interested in is being able to figure out (both live and from recording) the rate of publication and delay (at publish time) of messages on a single topic that are too big to have multiple subscribers or are too big to be recorded at full frequency. I also want to be able to check the rate/delay-related properties of individual published messages (i.e. reading their Header). As this need showed up several times in different places during my professional life, I’d like to converge at something that is easily and generically usable in many places and that has hope for having good support in generic diagnostics tools.

A longer summary of the ROS 1 analysis:

  • no suitable Heartbeat message type is available in "official" ROS repos
  • runtime checks
    • Topic statistics can be used to check frequency/delay of messages
      • this is a global option - either all topics or no topics (it can be selectively turned on/off for nodes by changing the value of /enable_statistics parameter when launching nodes, but it is impractical as it requires synchronizing the launch sequence)
      • topic statistics do not provide information about which exact message was missing or late
      • all statistics are published to a single topic (/statistics) so finding data relevant for a single topic needs matching the topic name against all messages in the topic
      • it requires no change to existing code, all subscribers implicitly support it
      • the statistics are available only when at least one subscriber is present (but it is not necessary to subscribe to the topic from the statistics-analyzing node)
    • FrequencyStatus and TimeStampStatus diagnostic tasks can be used to check frequency/delay of published messages
      • setup of these diagnostic tasks can be sometimes simple and sometimes more complicated (switching to AsyncSpinner and so on)
      • the tasks do not provide information about which exact message was missing or late
      • getting machine-readable information from the tasks requires string-parsing, which is error-prone
      • the statistics are provided even when there are no subscribers of the topic
    • Heartbeat diagnostic task is not suitable as it reports a plain timer-based heartbeat telling that the process of the node is running
    • Using Bond messages would break the ROS principle of using semantically-fitting message types (bond is inherently point-to-point, publishers are point-to-multipoint)
  • recordable (post-mission debug) checks
    • rate/delay of recorded topics can be found out pretty easily by playing them back and running rostopic hz, looking at rqt_bag visualization etc.
    • non-recorded topic statistics can be found in bag files if topic statistics were enabled and the /statistics topic was recorded
      • to debug a single topic, you need to filter the messages on /statistics by topic name
      • there is no way of figuring out which exact message was missing/late
      • statistics are available only for publishers that had at least one subscriber
      • on a system with a lot of connections, recording /statistics can have noticeable performance impact (observed in SubT Virtual challenge)
    • FrequencyStatus and TimeStampStatus tasks can be recorded and used in playback to give an idea how the publications were working
      • to get exact frequencies/delays, string-parsing is needed
      • there is no way of figuring out which exact message was missing/late

peci1 avatar Apr 28 '22 07:04 peci1

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/add-heartbeat-message-type/24162/27

ros-discourse avatar Apr 28 '22 07:04 ros-discourse

Thank you for the review, Tully. I'll try to make a set of packages utilizing this concept and report back here when it is done.

The approach I used to correlate the heartbeats to the parallel stream was by using a naming convention - I appended /heartbeat to the name of the parallel topic. So we e.g. have os_cloud_node/points with pointclouds and os_cloud_node/points/heartbeat with their heartbeat. I think this follows the same logic as e.g. camera_info.

peci1 avatar May 05 '22 01:05 peci1

Thanks a set of packages with a prototype would be great.

Yeah, that sort of naming convention makes sense generally. That should be documented clearly so people understand it. And it might be good to play out some of the cases that might be a problem, such as remapping, muxing, recording and playback etc.

I'm going to bump this to be a draft PR so it's not in our review queue for now.

tfoote avatar May 05 '22 22:05 tfoote

Slowly getting there:

Message: https://github.com/ctu-vras/cras_msgs/blob/master/msg/Heartbeat.msg

topic_tools-like library publishing heartbeat of any messages with header: https://github.com/ctu-vras/ros-utils/blob/master/cras_topic_tools/src/heartbeat.cpp (usage).

peci1 avatar Apr 09 '23 22:04 peci1