icub-main icon indicating copy to clipboard operation
icub-main copied to clipboard

Make that RX-DO-TX warnings are printed once every X s in order to not flood the log

Open Nicogene opened this issue 6 months ago • 20 comments

Is your feature request related to a problem?

Description

Related to:

  • #1018

When we run the walking @ 1khz every board print the waning that the time RX-DO-TX has exceeded, but this is something useful for debugging but it can make unreadable the log for the end user.

Definition of done

We have a PR that improve this situation

The solution you would like to have available

We can think about hide it behind a parameter, or print it only once every X seconds

Alternatives you have considered

No response

Additional context

No response

Nicogene avatar Jun 05 '25 10:06 Nicogene

Here are some functions that may be useful for this:

  • https://yarp.it/latest/Log_8h.html#a244e6ef02b9d0bc2ff9503c49980054a

cc @martinaxgloria @MSECode

Nicogene avatar Jun 09 '25 08:06 Nicogene

Once tackle this, we could check if this mitigates, or solves this problem:

  • https://github.com/robotology/icub-tech-support/issues/1543

Nicogene avatar Jun 10 '25 09:06 Nicogene

From a first analysis of the SysErrorParser (in particular, the case of eoerror_value_SYS_ctrloop_execoverflowRX error, but it should be the same for the TX and DO phases), the error messages are printed immediately when the function printMessage() is called, regardless of their content or type. The only difference in how they are handled is the logging function used (yInfo, yDebug, yWarning, yError), which is selected based on the severity field.

If we want to add a counter to print the message once every X seconds, instead, I think that we could add this functionality in the firmware, for example in the function that handles the diagnostic when an overflow of the RX-DO-TX phases is detected.

I don't know if it makes sense, what do you think? cc @Nicogene @valegagge @marcoaccame

martinaxgloria avatar Jun 10 '25 09:06 martinaxgloria

Hi @martinaxgloria, in here some comments.

  • icub-main + firmware already support the way the execution overflow messages are emitted by the ETH boards (see https://github.com/robotology/icub-main/pull/971) in two modes:

    • enable / disable the messages as soon as the execution overflow is detected.
    • enable / disable the periodic emission of messages w/ statistics of execution.
  • the above feature can be used in many modes and each board can be configured differently.

    • mode 1, tuning. we can measure the execution times (min, average, max) of RX-DO-TX for each ETH board and then tune the xml files w/ the correct values. In this mode the diagnostics messages can be greatly reduced.
    • mode 2, mitigating. we just send messages that bear statistics. we do that w/ LOGGING.IMMEDIATE.emitRXDOTXoverflow = false and w/ LOGGING.PERIODIC.emitRXDOTXstatistics = true and LOGGING.PERIODIC.period = 60 (for 60 seconds).
    • mode 3, ignoring. we just ignore every message related to execution overflow. we do that w/ LOGGING.IMMEDIATE.emitRXDOTXoverflow = false and w/ LOGGING.PERIODIC.emitRXDOTXstatistics = false
  • i developed last month and i will soon test a new FW w/ a modified approach of the RX-DO-TX threads in which they are not executed at exactly a precise synchronous time but in best effort one after the other. What will be lost is the synchronous behavior (the application of the PWM to the motors that is done in the TX phase will have a jitter inside the millisecond that depends on previous phases) but it will be possible to reduce the annoying messages. So as long as the sum(RX, DO, TX) is < 1000 us the ETH board will not emit any message. And the idea is to maintain the xml configuration of the emission of such messages.

marcoaccame avatar Jun 10 '25 10:06 marcoaccame

Thanks @marcoaccame for the clarification!

I think we don't want to completely ignore those messages, but the idea was to reduce the number of printings to avoid the log flooding. If this feature is already implemented, we just need to add those parameters in the XML of the interested boards and then test the functionality on the robot, maybe with the help of AMI people in order to trigger a case like:

  • https://github.com/robotology/icub-firmware/issues/577#issuecomment-2742813777

martinaxgloria avatar Jun 10 '25 12:06 martinaxgloria

Thanks @marcoaccame for the clarification!

I think we don't want to completely ignore those messages, but the idea was to reduce the number of printings to avoid the log flooding. If this feature is already implemented, we just need to add those parameters in the XML of the interested boards and then test the functionality on the robot, maybe with the help of AMI people in order to trigger a case like:

* [RX phase of the control loop last more than wanted icub-firmware#577 (comment)](https://github.com/robotology/icub-firmware/issues/577#issuecomment-2742813777)

In any case, both with positive and negative outcome we could write some documentation about this feature in order to let the user know its existence

Nicogene avatar Jun 10 '25 12:06 Nicogene

Can it be worth updating the iCubTemplates to add this info there?

martinaxgloria avatar Jun 10 '25 12:06 martinaxgloria

Can it be worth updating the iCubTemplates to add this info there?

Yes!

Nicogene avatar Jun 10 '25 12:06 Nicogene

In

  • https://github.com/robotology/icub-firmware/issues/577#issuecomment-2743574230

@marcoaccame wrote down the following information:

<group name="LOGGING">

    <!-- Contains enabling / disabling of logging that must be emitted immediately as soon as the condition appears --> 
    <group name="IMMEDIATE">
        <!-- enable / disable the emission of diagnostics related to the measured execution time of RX higher than the configured
             maxTimeOfRXactivity and similarly for DO and tx
          -->
        <param name="emitRXDOTXoverflow">       false </param>
    </group>
    
    <!-- Contains enabling / disabling of logging that must be emitted periodically -->
    <group name="PERIODIC">
        <!-- the period expressed in seconds in the interval [1.0, 600.0] -->
        <param name="period">                   10          </param>
        <!-- enable / disable the periodic emission of diagnostics about to the statistics of teh execution time of the RX, DO and TX phase
             in the form of minimul, average and maximum values.
          -->                        
        <param name="emitRXDOTXstatistics">     true        </param>
    </group>

</group>

If he confirms that they are accurate enough, I will proceed with the PR

martinaxgloria avatar Jun 10 '25 12:06 martinaxgloria

Here the PR:

  • https://github.com/robotology/robots-configuration/pull/737

martinaxgloria avatar Jun 10 '25 13:06 martinaxgloria

As @marcoaccame said, with emitRXDOTXoverflow, it's possible to enable/disable the diagnostics about execution overflow of the RX, DO and TX phases. The default value is True if this parameter (and the entire LOGGING group) is not defined in the configuration files, so the diagnostic is now enabled on the robot.

About:

Once tackle this, we could check if this mitigates, or solves this problem:

* [[ergoCubSN000] Robot not starting from yarpmanager icub-tech-support#1543](https://github.com/robotology/icub-tech-support/issues/1543)

I booked ergoCubSN002, I'll try to add this parameter in the XML of the boards that trigger the flooding and set it to False to disable the diagnostic. Then, I'll try to launch the yri from the yarpmanager to understand if the two problems are related to each other.

martinaxgloria avatar Jun 11 '25 08:06 martinaxgloria

Hi @martinaxgloria , Just to be super thorough, wanna try leaving the emitRXDOTXstatistics on for a minute and see what happens? What do you think?

valegagge avatar Jun 11 '25 08:06 valegagge

Hi @valegagge, sure! I have a reservation for tomorrow morning, I'll write here the outcomes

martinaxgloria avatar Jun 11 '25 08:06 martinaxgloria

yarprobotinterface from yarpmanager

About:

  • https://github.com/robotology/icub-tech-support/issues/1543

Before doing any change, on ergoCubSN002, I tried to launch the yri from the yarpmanager using the 1-ami-ergoCubStartup (basically, the yarprobotinterface is launched on ergocub-torso) and it started properly. So, this robot doesn't have the problem reported in that issue; it could be related to ergoCubSN000 only.

RX-DO-TX

Before adding the LOGGING group to handle the diagnostic of the RX-DO-TX phases, I saved a yarprobotinterface log running for a couple of minutes:

to understand how many times the warning message SYS: the RX phase of the control loop has last more than wanted (and similar for the TX and DO phases) is triggered and which boards trigger this error. Based on a simple find in the log I attached above, I retrieved like ~280 occurrences from 3 boards: right_leg-eb6, left_leg-eb8 and torso-eb5-j0_2.

In those boards, I added the LOGGING group as follows:

<group name="LOGGING">
    <group name="IMMEDIATE">
        <param name="emitRXDOTXoverflow">       false </param>
    </group>
    <group name="PERIODIC">
        <param name="period">                   10.0          </param>                     
        <param name="emitRXDOTXstatistics">     true        </param>
    </group>
</group>

I kept the yarprobotinterface up for ~2 min and here is the log with the statistics printed every 10s:

Relevant data: this time the error is triggered 37 times from different boards in which the LOGGING group is not added, so the diagnostic is enabled by default. Moreover, the statistics are 3 messages (one per each phase):

[INFO] from BOARD 10.0.1.8 (left_leg-eb8-j0_3) time=805s 67m 869u :  SYS: execution time statistics (min, average, max) us in a period: runner.RX() -> (41, 86, 728) us over 10.000000 sec
[INFO] from BOARD 10.0.1.8 (left_leg-eb8-j0_3) time=805s 68m 526u :  SYS: execution time statistics (min, average, max) us in a period: runner.DO() -> (413, 427, 580) us over 10.000000 sec
[INFO] from BOARD 10.0.1.8 (left_leg-eb8-j0_3) time=805s 69m 734u :  SYS: execution time statistics (min, average, max) us in a period: runner.TX() -> (123, 126, 156) us over 10.000000 sec

cc @valegagge @Nicogene

martinaxgloria avatar Jun 12 '25 09:06 martinaxgloria

With emitRXDOTXoverflow = false the warnings are not displayed as I expected and that is OK.

From the measured stats however we see that we cannot properly tune maxTimeOfRXactivity, maxTimeOfDOactivity and maxTimeOfTXactivity in the xml so that the overflow never happens. To remove the overflows we should assign to them the worst case

728 580 156

whose sum is 1464 > 1000

Also, we don't know if and how many times inside a period the sum of RX-DO-TX is > 1000 because we don't report this statistics

We just know that in average we have an execution time = 639 which seems OK.

Image

marcoaccame avatar Jun 12 '25 12:06 marcoaccame

  • i developed last month and i will soon test a new FW w/ a modified approach of the RX-DO-TX threads in which they are not executed at exactly a precise synchronous time but in best effort one after the other. What will be lost is the synchronous behavior (the application of the PWM to the motors that is done in the TX phase will have a jitter inside the millisecond that depends on previous phases) but it will be possible to reduce the annoying messages. So as long as the sum(RX, DO, TX) is < 1000 us the ETH board will not emit any message. And the idea is to maintain the xml configuration of the emission of such messages.

I have tested it on a setup and it is working fine.

I am now working on the xml settings so that we can enable either the legacy mode or the new one, we can enable or disable diagnostics w/ more statistics on total execution time.

In this way we shall surely remove the warnings the flood the log but also we can have a definitive answer to the questions: does the ETH board executes below or beyond the period (so far = 1 ms)? if beyond, how many times?

stay tuned.

marcoaccame avatar Jun 12 '25 13:06 marcoaccame

This comment can be of interest of @S-Dafarra:

  • https://github.com/robotology/icub-main/issues/1024#issuecomment-2965931677

maybe it makes sense what is the combination of factors that makes the yarprobotinterface not run via yarpmanager

Nicogene avatar Jun 12 '25 14:06 Nicogene

Hi @marcoaccame , I have only a doubt:

Image Will it be a problem if we lose this synchronization?

valegagge avatar Jun 17 '25 12:06 valegagge

Hi @marcoaccame , I have only a doubt:

Image Will it be a problem if we lose this synchronization?

I wrote that just because originally the slotted mode was designed for that reason. But I talked w/ @maggia80 who asked me to implement this best effort feature and it seems to be OK.

marcoaccame avatar Jun 17 '25 13:06 marcoaccame

For concluding this issue we may open PRs for adding this section in all the ergocubs:

  • https://github.com/robotology/icub-main/issues/1024#issuecomment-2965931677

Nicogene avatar Jun 20 '25 12:06 Nicogene

PRs opened:

  • https://github.com/icub-tech-iit/robots-configuration/pull/8
  • https://github.com/icub-tech-iit/robots-configuration/pull/7
  • https://github.com/icub-tech-iit/robots-configuration/pull/6

martinaxgloria avatar Jun 30 '25 09:06 martinaxgloria

We can close this issue

Nicogene avatar Jun 30 '25 15:06 Nicogene