yarp
yarp copied to clipboard
Yarplogger may not report all entries
Bug description
I happened to spot an unexpected behavior concerning the tracing of operations such as yInfo
, yWarning
, yError
etc. within the yarplogger
.
If two operations are very close each other in time, I can observe that the yarplogger
may not keep track of their invocations. In particular, the lower the time interval the higher the probability of a missed trace.
To Reproduce
In order to reproduce the behavior with the smallest code snippet, I crafted a yInfo-test
repository where you can find a simple case study (fully documented).
The yInfo-test
application contains a simple for-loop that performs 10k iterations with just one call to the yInfo
function per cycle. Note that after each iteration a yarp::os::Time::delay is also called so we can vary the elapsed time between the yInfo
calls.
The test scenario foresees the use of yarplogger
without the yarprun
broker, thus relying on the YARP_FORWARD_LOG_ENABLE=1
env var.
After some quick tests, it came out that there exists a threshold of 0.001s. Below this value (included), the number of missing yInfo
logged by the yarplogger
dramatically increases.
Please, notice that the for-loop is only aimed to replicate the behavior, whereas two very close logging operations may occur wherever in the user code.
Below, you can see some results comparing what was produced by the yarplogger
with the expected results:
🔘 Click to see the result of Test 1
./check_log_file.sh build/yarprunlog_04_07_2021_23_46_12.log
Parsing log file...
Missing value 1707, previous was at [ Time delay ]: 0.001 Counter value: 1706
Done.
🔘 Click to see the result of Test 2
./check_log_file.sh build/yarprunlog_04_07_2021_23_41_24.log
Parsing log file...
Missing value 3224, previous was at [ Time delay ]: 0.0001 Counter value: 3223
Missing value 3512, previous was at [ Time delay ]: 0.0001 Counter value: 3511
Missing value 3513, previous was at
Missing value 3514, previous was at
Missing value 3515, previous was at
Missing value 8594, previous was at [ Time delay ]: 0.0001 Counter value: 8593
Missing value 8595, previous was at
Missing value 8596, previous was at
Done.
🔘 Click to see the result of Test 3
./check_log_file.sh build/yarprunlog_05_07_2021_00_18_19.log
Parsing log file...
Missing value 7, previous was at [ Time delay ]: 1e-05 Counter value: 6
Missing value 9, previous was at [ Time delay ]: 1e-05 Counter value: 8
Missing value 145, previous was at [ Time delay ]: 1e-05 Counter value: 144
Missing value 197, previous was at [ Time delay ]: 1e-05 Counter value: 196
Missing value 199, previous was at [ Time delay ]: 1e-05 Counter value: 198
Missing value 549, previous was at [ Time delay ]: 1e-05 Counter value: 548
Missing value 687, previous was at [ Time delay ]: 1e-05 Counter value: 686
Missing value 1163, previous was at [ Time delay ]: 1e-05 Counter value: 1162
Missing value 1230, previous was at [ Time delay ]: 1e-05 Counter value: 1229
Missing value 1335, previous was at [ Time delay ]: 1e-05 Counter value: 1334
Missing value 1659, previous was at [ Time delay ]: 1e-05 Counter value: 1658
Missing value 1891, previous was at [ Time delay ]: 1e-05 Counter value: 1890
Missing value 2071, previous was at [ Time delay ]: 1e-05 Counter value: 2070
Missing value 2206, previous was at [ Time delay ]: 1e-05 Counter value: 2205
Missing value 3038, previous was at [ Time delay ]: 1e-05 Counter value: 3037
Missing value 3306, previous was at [ Time delay ]: 1e-05 Counter value: 3305
Missing value 3523, previous was at [ Time delay ]: 1e-05 Counter value: 3522
Missing value 3648, previous was at [ Time delay ]: 1e-05 Counter value: 3647
Missing value 3880, previous was at [ Time delay ]: 1e-05 Counter value: 3879
Missing value 3904, previous was at [ Time delay ]: 1e-05 Counter value: 3903
Missing value 4372, previous was at [ Time delay ]: 1e-05 Counter value: 4371
Missing value 4558, previous was at [ Time delay ]: 1e-05 Counter value: 4557
Missing value 4559, previous was at
Missing value 4561, previous was at [ Time delay ]: 1e-05 Counter value: 4560
Missing value 4811, previous was at [ Time delay ]: 1e-05 Counter value: 4810
Missing value 4974, previous was at [ Time delay ]: 1e-05 Counter value: 4973
Missing value 5115, previous was at [ Time delay ]: 1e-05 Counter value: 5114
Missing value 5415, previous was at [ Time delay ]: 1e-05 Counter value: 5414
Missing value 5633, previous was at [ Time delay ]: 1e-05 Counter value: 5632
Missing value 5711, previous was at [ Time delay ]: 1e-05 Counter value: 5710
Missing value 5730, previous was at [ Time delay ]: 1e-05 Counter value: 5729
Missing value 5733, previous was at [ Time delay ]: 1e-05 Counter value: 5732
Missing value 5883, previous was at [ Time delay ]: 1e-05 Counter value: 5882
Missing value 6125, previous was at [ Time delay ]: 1e-05 Counter value: 6124
Missing value 6131, previous was at [ Time delay ]: 1e-05 Counter value: 6130
Missing value 6134, previous was at [ Time delay ]: 1e-05 Counter value: 6133
Missing value 6140, previous was at [ Time delay ]: 1e-05 Counter value: 6139
Missing value 6166, previous was at [ Time delay ]: 1e-05 Counter value: 6165
Missing value 6169, previous was at [ Time delay ]: 1e-05 Counter value: 6168
Missing value 6262, previous was at [ Time delay ]: 1e-05 Counter value: 6261
Missing value 6473, previous was at [ Time delay ]: 1e-05 Counter value: 6472
Missing value 6604, previous was at [ Time delay ]: 1e-05 Counter value: 6603
Missing value 6620, previous was at [ Time delay ]: 1e-05 Counter value: 6619
Missing value 6785, previous was at [ Time delay ]: 1e-05 Counter value: 6784
Missing value 6786, previous was at
Missing value 6787, previous was at
Missing value 6789, previous was at [ Time delay ]: 1e-05 Counter value: 6788
Missing value 6790, previous was at
Missing value 6791, previous was at
Missing value 6792, previous was at
Missing value 6793, previous was at
Missing value 6811, previous was at [ Time delay ]: 1e-05 Counter value: 6810
Missing value 6814, previous was at [ Time delay ]: 1e-05 Counter value: 6813
Missing value 7026, previous was at [ Time delay ]: 1e-05 Counter value: 7025
Missing value 7322, previous was at [ Time delay ]: 1e-05 Counter value: 7321
Missing value 7502, previous was at [ Time delay ]: 1e-05 Counter value: 7501
Missing value 7674, previous was at [ Time delay ]: 1e-05 Counter value: 7673
Missing value 7916, previous was at [ Time delay ]: 1e-05 Counter value: 7915
Missing value 8061, previous was at [ Time delay ]: 1e-05 Counter value: 8060
Missing value 8130, previous was at [ Time delay ]: 1e-05 Counter value: 8129
Missing value 8340, previous was at [ Time delay ]: 1e-05 Counter value: 8339
Missing value 8347, previous was at [ Time delay ]: 1e-05 Counter value: 8346
Missing value 8465, previous was at [ Time delay ]: 1e-05 Counter value: 8464
Missing value 8478, previous was at [ Time delay ]: 1e-05 Counter value: 8477
Missing value 8738, previous was at [ Time delay ]: 1e-05 Counter value: 8737
Missing value 8809, previous was at [ Time delay ]: 1e-05 Counter value: 8808
Missing value 8814, previous was at [ Time delay ]: 1e-05 Counter value: 8813
Missing value 8831, previous was at [ Time delay ]: 1e-05 Counter value: 8830
Missing value 8886, previous was at [ Time delay ]: 1e-05 Counter value: 8885
Missing value 9276, previous was at [ Time delay ]: 1e-05 Counter value: 9275
Missing value 9288, previous was at [ Time delay ]: 1e-05 Counter value: 9287
Missing value 9309, previous was at [ Time delay ]: 1e-05 Counter value: 9308
Missing value 9569, previous was at [ Time delay ]: 1e-05 Counter value: 9568
Missing value 9980, previous was at [ Time delay ]: 1e-05 Counter value: 9979
Done.
Expected behavior
When we call the 2 or more yInfo
functions that are near in time, we expect to see their output as many times as they have been called. Unfortunately, if we compare the console output with that shown by the yarplogger
, we can observe that in the latter case some calls are missing.
Screenshots
In the following screenshot we can see that yarplogger
missed to capture the yInfo
with counter value 298.
Configuration (please complete the following information):
- OS: Windows 10 + WLS2
- yarp version: 3.4.5 (installed with robotology-superbuild)
- compiler: GNU Make 4.2.1
Additional context
While I was providing a solution for the following assignments:
I was trying to debug the application using the yInfo
functions to inspect the coverage of my code. Putting the yInfo
calls before and after some instructions or code blocks, and looking at yarplogger
, it seemed that some code blocks were never executed. However, after I had tried to run the code using only the console, I realized that the yarplogger
was not showing all the information as it should have done.
As suggested by @pattacini and @traversaro, we did the test also in the configuration where we employ the yarprun
as a broker in place of the localhost.
In this case, the results seem better (less traces missed), but the problem persists as the delay time decreases. I also noticed the yWarning
and yError
seem more rugged and reliable than the yInfo
; maybe, this is due to a possible priority mechanism, I suppose.
I suppose this might be caused by the fact that the logger is using a BufferedPort...
https://github.com/robotology/yarp/blob/97723811b5e89806feed21a9892cc19242f27251/src/libYARP_logger/src/yarp/logger/YarpLogger.h#L220
CC @randaz81
I suppose this might be caused by the fact that the logger is using a BufferedPort...
I was thinking the same, but this buffered port is configured as strict (on read):
https://github.com/robotology/yarp/blob/efc07a766fe70a293e391a4f95411bd564a2c4d8/src/libYARP_logger/src/yarp/logger/YarpLogger.cpp#L521
So perhaps the forwarding port (on write) is to be blamed, not the yarplogger's one. The log forwarder enables background writes:
https://github.com/robotology/yarp/blob/efc07a766fe70a293e391a4f95411bd564a2c4d8/src/libYARP_os/src/yarp/os/impl/LogForwarder.cpp#L43
According to the following inline doc, it does not wait for writes to complete immediately (false
), hence I think some of them may be lost in the process:
https://github.com/robotology/yarp/blob/5313b2defd8be7237567ea56eca8dde93dfc8121/src/libYARP_os/src/yarp/os/impl/PortCore.h#L542
Any comment on this @randaz81 ?
I believe that @PeterBowman is right, the problem is probably this static bottle:
https://github.com/robotology/yarp/blob/efc07a766fe70a293e391a4f95411bd564a2c4d8/src/libYARP_os/src/yarp/os/impl/LogForwarder.cpp#L49-L59
When the forward
method is called twice, the Bottle::clear()
method could be called before the message is sent, therefore, if I understand correctly, the empty bottle contains no data and, when serialized as binary (not as string), nothing is sent (which is probably a different bug to be further investigated.
I opened #2667 which fixes the issue for me (at least according to @sgiraz very useful check_log_file.sh
script). Does anyone else want to try it before I merge it?
if I understand correctly, the empty bottle contains no data and, when serialized as binary (not as string), nothing is sent (which is probably a different bug to be further investigated.
Just for reference, I was wrong on this, empty Bottles are sent correctly both in binary and text form
Hi guys! I hope you are well 🙂
I'm reaching you because I would like to know if you have found any solution to this logging issue. If yes, please, could provide a brief description about how you solved it?
Thanks! 🙏🏻
Nope, sorry. You can try #2667 that fixes the issue, but blocks the sender thread when there are too many messages, and it is therefore dangerous, hence it cannot be merged in YARP (at least for now). I don't see a solution, unless we implement a "background strict" mode in the port, that makes the buffer bigger when there are messages in queue.