icub-tech-support icon indicating copy to clipboard operation
icub-tech-support copied to clipboard

[ergoCubSN000] Robot not starting from yarpmanager

Open lrapetti opened this issue 1 year ago • 30 comments

Device name 🤖

ergoCubSN000

Request/Failure description

When starting the robot from the yarpmanager we are getting the error:

<#STRING_START#>[h, o, s, t, T, r, a, n, s, c, e, i, v, e, r, (, ), :, :, p, a, r, s, e, (, ),  , d, e, t, e, c, t, e, d,  , a, n,  , E, R, R, O, R,  , i, n,  , s, e, q, u, e, n, c, e,  , n, u, m, b, e, r,  , f, r, o, m,  , I, P,  , =,  , 1, 0, ., 0, ., 1, ., 4, .,  , E, x, p, e, c, t, e, d, :,  , 4, 5, 0, 8, ,,  , R, e, c, e, i, v, e, d, :,  , 4, 8, 8, 8, ,,  , M, i, s, s, i, n, g, :,  , 3, 8, 0, ,,  , P, r, e, v,  , F, r, a, m, e,  , T, X,  , a, t,  , 2, 3, 4, 3, 8, 8, 2, 9, 8,  , u, s, ,,  , T, h, i, s,  , F, r, a, m, e,  , T, X,  , a, t,  , 2, 3, 5, 1, 5, 0, 2, 9, 8,  , u, s, <#STRING_END#>

this was happening also in the last days.

Detailed context

Here is a log saved from yarplogger

yarprunlog_20_04_2023_16_49_52.log

Additional context

No response

How does it affect you?

No response

cc @CarlottaSartore @DanielePucci

lrapetti avatar Apr 20 '23 16:04 lrapetti

This was observed also by @AntonioConsilvio

lrapetti avatar Apr 20 '23 16:04 lrapetti

It may be a network issue. We'll check that asap. Stay tuned 📢

sgiraz avatar Apr 20 '23 16:04 sgiraz

Hi @lrapetti, me and @sgiraz have updated all the robot boards and the robot software. Now the robot seems to start from yarpmanager without any problems.

However, give us feedback on the functioning of the robot!

AntonioConsilvio avatar Apr 21 '23 15:04 AntonioConsilvio

As per https://github.com/robotology/icub-tech-support/issues/1544#issuecomment-1518057636

lrapetti avatar Apr 21 '23 16:04 lrapetti

Closing!

sgiraz avatar Apr 23 '23 14:04 sgiraz

Reopening since the issue happened again today.

Here are a few logs yarprunlog_26_04_2023_16_11_29.log yarprunlog_26_04_2023_16_17_59.log yarprunlog_26_04_2023_16_15_34.log again, when running from the terminal the problem was not happening.

Note that this time we did not experience something like https://github.com/robotology/icub-tech-support/issues/1544

lrapetti avatar Apr 26 '23 15:04 lrapetti

cc @AntonioConsilvio

sgiraz avatar Apr 26 '23 15:04 sgiraz

The problem just happened now.

Here another logger log_ergocub-torso_yarprobotinterface_2764.txt

GiulioRomualdi avatar May 02 '23 09:05 GiulioRomualdi

Hi @GiulioRomualdi @lrapetti

When launching yarprobotinterface from the console, do you spot any error/warning that can be related somehow?

@traversaro @maggia80 and I remember that something similar was occurring on iCub3 in L.A. (or even a bit earlier), although we haven't tracked it down in the proper issue yet. Perhaps @S-Dafarra has a reference to share.

pattacini avatar May 02 '23 09:05 pattacini

Hi @pattacini, we noticed that when the interface is started from the manager, we got a lot of host transceiver errors. This do not happen if started from the terminal

GiulioRomualdi avatar May 02 '23 09:05 GiulioRomualdi

Hi @pattacini, we noticed that when the interface is started from the manager, we got a lot of host transceiver errors. This do not happen if started from the terminal

Hi @GiulioRomualdi @lrapetti,

Is yarplogger when you try to run yarprobotinterface?

An interesting test could be disabling the log from yarprun:

immagine

because basically, the only difference between running it with yarpmanager or from terminal should be the streaming of the log

Nicogene avatar May 02 '23 10:05 Nicogene

It seems also yarprun on the head is not able to start from the yarpmanager. However I am able to start it from the terminal. Not sure if this is related

mebbaid avatar May 04 '23 08:05 mebbaid

It seems also yarprun on the head is not able to start from the yarpmanager. However I am able to start it from the terminal. Not sure if this is related

today, i was able to do yarprun from the yarpmanager. Not sure if the issue was related to multiple yarprun processes running on the head.

mebbaid avatar May 05 '23 12:05 mebbaid

OK @mebbaid let us know if the problem persists.

Not sure if the issue was related to multiple yarprun processes running on the head.

To check this, you may try to open multiple yarprun instances from the terminal on the same SERVERPORT used by the yarpmanager and see what happens.

sgiraz avatar May 10 '23 08:05 sgiraz

Hold on, we are discussing about two different issues. The issue @mebbaid mentioned is related to launching yarprun on the Xavier. I think this is due to the long time it takes to enter in ssh, that sometimes causes yarpmanager to think that yarprun did not start. This is unrelated to the initial problem, where the yarprobotinterface refuses to start when launched from yarpmanager on the torso (hence a different machine).

I tried launching it from terminal with YARP_LOG_FORWARD_ENABLE and it works. I guess that the only difference is that when launched from terminal, the yarprobotinterface is slowed down a bit by the terminal output. It seems fishy to me, it is like the initial communication to some boards is slower than usual

S-Dafarra avatar May 10 '23 08:05 S-Dafarra

Hi guys,

@AntonioConsilvio spotted that it happens when you try to run both yarprobotinterface and yarplogger at the same time. It works fine if you run the yarprobotinterface first, then run the yarplogger after a while. As suggested (💡) by @davidelasagna we may try to use the wired ETH connection and check if the problem persists when they are launched together. If yes, it may be either a bandwidth or router configuration issue.

Notes:

  • yarpmanager has been launched from the laptop.
  • ⚠️ The solution to run the yarprobotinterface and then the yarplogger (e.g. after the calibration) doesn't allow us to catch all the possible errors/warnings that happen during the startup of the robot.

cc @Nicogene @S-Dafarra @lrapetti

sgiraz avatar May 17 '23 12:05 sgiraz

This issue is open for a long time without any recent activity, it appears that a solution has been found (or at least identified). Therefore, I will proceed to close it. However, please feel free to reopen it if necessary.

sgiraz avatar Jul 05 '23 07:07 sgiraz

I would avoid closing it since we have to run the interface from the terminal in order to have the robot running and this is not the standard way to use the robot If you think this is not the right place to open the issue we can move it somewhere else

GiulioRomualdi avatar Jul 06 '23 07:07 GiulioRomualdi

I would avoid closing it since we have to run the interface from the terminal in order to have the robot running and this is not the standard way to use the robot If you think this is not the right place to open the issue we can move it somewhere else

I agree with @GiulioRomualdi, the issue does not seem to be solved. Let us know if you want to track the problem in a different location.

lrapetti avatar Jul 10 '23 13:07 lrapetti

This issue has been automatically marked as stale because it did not have recent activity. It will be closed if no further activity occurs.

github-actions[bot] avatar Sep 19 '23 08:09 github-actions[bot]

This is still happening I guess

S-Dafarra avatar Sep 19 '23 08:09 S-Dafarra

Added https://github.com/robotology/icub-tech-support/labels/pinned

pattacini avatar Sep 19 '23 08:09 pattacini

Now the issue seem to happen also when running the robot from terminal from time to time. We experienced this during the IIT20y demo together with @AntonioConsilvio. unfortunately we do not have nay log due to https://github.com/robotology/icub-tech-support/issues/1645

S-Dafarra avatar Sep 22 '23 13:09 S-Dafarra

@lrapetti @GiulioRomualdi @mebbaid @CarlottaSartore please try to summarize the cases in which this happens

S-Dafarra avatar Oct 02 '23 08:10 S-Dafarra

Until last week the situation was the following (it is documentef in the comments above till https://github.com/robotology/icub-tech-support/issues/1543#issuecomment-1551333714):

  • from yarpmanager: if you launch the yarplogger and start the robot yarpmotorinterface, it always fails with the error above
  • from yarpmanager: if you launch the yarprobotinterface, wait for the robot to start the calibration, and then start the yarprobotlogger, it was always able to start.
  • from terminal: if you launch the yarprobotinterface from the terminal, it was always able to start.

Since last week a new behaviour has been observed, as documented in https://github.com/robotology/icub-tech-support/issues/1543#issuecomment-1731457962. Basically, sometimes the robot is not able to start even starting from the terminal (I don't know if the failure is due to the same error).

lrapetti avatar Oct 02 '23 15:10 lrapetti

Related or at least similar to:

  • https://github.com/robotology/icub-tech-support/issues/1118
  • https://github.com/robotology/icub-tech-support/issues/1195

It would be interesting see if also on SN001 is happening

cc @pattacini @marcoaccame

Nicogene avatar Nov 07 '23 15:11 Nicogene

It would be interesting see if also on SN001 is happening

Today I had the chance to test this issue both on SN000 and SN001, and on SN001 it is not happening. In theory, the only difference between the two robots is the fact that the SN001 is missing the forearm. Talking w/ @maggia80 and @marcoaccame it came out that is unlikely that this is due to a damaged ethernet connector in the chain because otherwise, we should have this problem also when running the yarprobotinterface from the terminal.

Running it from yarpmanager w/ yarplogger requires that the COM express handle not only the eth traffic to/from the boards but also the traffic of the log to the laptop. There is the possibility that the COM express is running on the edge of its CPU capabilities w/ the power consumption setting of the BIOS we set for solving the overheating problem.

We should check with htop the cpu usage of the COM express on both robot and the BIOS configuration.

The suspicious thing seems that this problem came out after

  • https://github.com/robotology/robots-configuration/pull/445

This could stress a lot the eth traffic and then the cpu, and the calibration phase is critical in this sense.

Note that without the forearm we miss several boards, so maybe this is the reason why ergoCub SN001 starts fine

cc @GiulioRomualdi @lrapetti @Fabrizio69 @pattacini @sgiraz @AntonioConsilvio

Nicogene avatar Nov 08 '23 13:11 Nicogene

Running it from yarpmanager w/ yarplogger requires that the COM express handle not only the eth traffic to/from the boards but also the traffic of the log to the laptop.

Note that we were used to run the yarprobotinterface with YARP_LOG_FORWARD_ENABLE. See the comment above. Hence, there should not be differences on this side. What about the opposite instead? Printing in a terminal requires time, and there are a ton of messages. This might slow down the yarprobotinterface process when running it from terminal. Is it possible that there is a concurrency issue when launching the different devices?

S-Dafarra avatar Nov 08 '23 14:11 S-Dafarra

Note that we were used to run the yarprobotinterface with YARP_LOG_FORWARD_ENABLE. See https://github.com/robotology/icub-tech-support/issues/1543#issuecomment-1541616702. Hence, there should not be differences on this side. What about the opposite instead? Printing in a terminal requires time, and there are a ton of messages. This might slow down the yarprobotinterface process when running it from terminal. Is it possible that there is a concurrency issue when launching the different devices?

I am not sure that running yarprobotinterface w/ YARP_LOG_FORWARD_ENABLE and running via yarprun follows the same code paths inside YARP maybe one is more efficient than the other?

@davidelasagna noticed that we have this setting in the documentation

immagine

And since the image of ergocub SN000 was created starting from an icub one these settings were still set to the user icub. We should change it to ergocub reboot and see if the problem persists

Maybe also the buffersize has to be revised, it is quite old and maybe obsolete

Nicogene avatar Nov 08 '23 14:11 Nicogene

We tried to both add this configuration and the RXRate in the pc104.xml to 1 ms but the behaviour unfortunately is the same

Nicogene avatar Nov 08 '23 17:11 Nicogene