diagnostics icon indicating copy to clipboard operation
diagnostics copied to clipboard

Why is the /diagnostics_toplevel_state ERROR when one of the diagnostics is STALE

Open Rayman opened this issue 2 years ago • 7 comments

In the current code the toplevel state is only STALE when ALL the diagnostics are STALE. Example:

group_analyzer

What I think is more logical is that the state is STALE when one of the diagnostics is STALE and none of them ERROR: group_analyzer2

What do you think about this logic? I've implemented this in my fork, but it is a breaking change

Rayman avatar Apr 18 '23 14:04 Rayman

Hi @Rayman and thank you for your comment. I agree with you that this makes more sense but, as you stated, this is a breaking change so for Noetic I do not believe it can be merged as is. I am not sure how the ros2 maintainers would see this change for the ros2 version...

g-gemignani avatar Apr 18 '23 15:04 g-gemignani

Hi @Rayman. Thanks for your suggestion. Just to be clear: The stale state is set by the generic analyzer if no message was received within a given timeout: https://github.com/ros/diagnostics/blob/a80bd1c33786e5e5642f91b2b6016048f32fbf0e/diagnostic_aggregator/include/diagnostic_aggregator/generic_analyzer_base.hpp#L196 Which can be useful information on the actuality of a state.

If I get it correctly, your suggestion is to treat it in aggregation like the other levels and aggregate it in the group. There, I would honestly have a hard time to rate it in severity between the other levels. Currently, it is level 3 which reads as the highest priority. This means you would only see stale on the highest level, even if another item in that group is in the error state. This can not be the intended behavior. Changing these levels would be a SERIOUS breaking change.

What is your take on that?

ct2034 avatar Apr 24 '23 11:04 ct2034

The toplevel state is not just the maximum of all the levels. Its calculated with the following algorithm

if maximum_level > ERROR and minimum_level <= ERROR
	# one or more STALE, but not all of them
	level = ERROR
else:
	level = maximum_level

I would propose to change this to the following, because I think it's more logical.

if maximum_level == STALE and maximum_level_without_stale < ERROR
	# one or more STALE, but no errors
	level = STALE
else:
	level = maximum_level_without_stale

This will be the difference between the two algorithms:

diagnostic1 diagnostic2 current proposed
stale ok error stale
stale warn error stale
stale error error error
stale stale stale stale

Rayman avatar Apr 24 '23 12:04 Rayman

What I find counter-intuitive about the current behavior is that if you have three leaf diagnostics rolled up into a group, the discard_stale doesn't seem to have an impact on the parent status. For example, if bar and baz in the example below go stale, but foo is OK, I would intuitively think that the part group should also be OK. However, what I'm seeing is that foo currently gets marked as ERROR.

diagnostics_aggregator:
  ros__parameters:
    pub_rate: 1.0
    path: 'robot'
    analyzers:
      part:
        type: 'diagnostic_aggregator/AnalyzerGroup'
        path: 'part'
        foo:
          type: 'diagnostic_aggregator/GenericAnalyzer'
          path: 'foo'
          find_and_remove_prefix: ['/foo:']
          num_items: 1
        bar:
          type: 'diagnostic_aggregator/GenericAnalyzer'
          path: 'bar'
          find_and_remove_prefix: ['/foo:']
          discard_stale: true
        baz:
          type: 'diagnostic_aggregator/GenericAnalyzer'
          path: 'baz'
          find_and_remove_prefix: ['/baz:']
          discard_stale: true

asymingt avatar Sep 20 '23 23:09 asymingt

I did not want to propose to merge a breaking change in noetic, so I've implemented my proposed change in our fork: https://github.com/nobleo/diagnostics. Feel free to use it

I've implemented the change for the toplevel diagnostics and for the AnalyserGroup.

Rayman avatar Sep 21 '23 07:09 Rayman

I also added my implementation here: https://github.com/ros/diagnostics/pull/315

asymingt avatar Sep 21 '23 20:09 asymingt

Since this was a breaking change for Noetic, it probably also is for Humble and Iron?

Does it make sense to put some effort into this before ROS Jazzy?

Timple avatar Jan 19 '24 06:01 Timple