uftrace
uftrace copied to clipboard
Random failure in ./runtest 222
Reproduced random failure for ./runtest -vdik 222
on x64 WSL2 Ubuntu 20.04
The info below is from one instance of failure where the external data comment is missing
uftrace replay -f time
output
# TIMESTAMP FUNCTION
426883.003490100 | main() {
426883.003490500 | a() {
426883.003490600 | b() {
426883.003490700 | c() {
426883.003491400 | getpid();
426883.003493200 | } /* c */
426883.003493500 | } /* b */
426883.003493600 | } /* a */
426883.003493700 | } /* main */
Expected output
# TIMESTAMP FUNCTION
426883.003490100 | main() {
426883.003490101 | /* external-data: user message */
426883.003490500 | a() {
426883.003490600 | b() {
426883.003490700 | c() {
426883.003491400 | getpid();
426883.003493200 | } /* c */
426883.003493500 | } /* b */
426883.003493600 | } /* a */
426883.003493700 | } /* main */
extern.dat
content: 426883.3490101 user message
- Editing the file to add 2 leading zeros after the decimal does not solve the problem
Above failure might be due to parsing error when timestamp has leading zero (unverified claim)
Originally posted by @JHH20 in https://github.com/namhyung/uftrace/issues/1063#issuecomment-1206474208
So far, this test seems to fail in 2 different ways:
- The external-data message is missing entirely
- The message is misplaced between 2 functions rather than directly below main()
I originally thought it might be related to buffering differences between tty and non-tty, but passing a pseudo-tty in ./runtest.py via pty.openpty did not solve the issue.
Adding output when external-data message is misplaced between 2 functions.
Test ran as ./runtest.py -dk 222
and the following was the 4th test
# TIMESTAMP FUNCTION
35775.262034200 | __monstartup();
35775.262035200 | __cxa_atexit();
35775.262035800 | main() {
35775.262035800 | a() {
35775.262035801 | /* external-data: user message */
35775.262035900 | b() {
35775.262035900 | c() {
35775.262036000 | getpid();
35775.262036400 | } /* c */
35775.262036500 | } /* b */
35775.262036600 | } /* a */
35775.262036600 | } /* main */
Note that main()
and a()
have the exact same timestamp, so the current method of adding 1 from main()
to create the external-data timestamp does not guarantee the intended ordering.
If runtest.py always sort the lines using stable sorting algorithm, then it would be better to make external-data timestamp identical to main() and insert it in the correct location.
It looks like a clock precision issue on WSL and nothing much we can do about it. Please fix the problem with the leading 0s.
The following unit test shows the problem.
diff --git a/utils/utils.c b/utils/utils.c
index aa0ef2f4..66267350 100644
--- a/utils/utils.c
+++ b/utils/utils.c
@@ -1065,4 +1065,21 @@ TEST_CASE(utils_strv)
return TEST_OK;
}
+
+TEST_CASE(parse_timestamp)
+{
+ uint64_t ts;
+
+ ts = parse_timestamp("8960070.725168293");
+ TEST_EQ(ts, 8960070725168293);
+
+ ts = parse_timestamp("8959832.082682731");
+ TEST_EQ(ts, 8959832082682731); // fails here
+
+ ts = parse_timestamp("426883.003490100");
+ TEST_EQ(ts, 426883003490100); // fails here
+
+ return TEST_OK;
+}
+
#endif /* UNIT_TEST */
It fails as follows.
$ make TESTARG="-v parse_timestamp" unittest
TEST test_unit
Running 1 test cases
======================
Testing parse_timestamp...
test failed at /home/honggyu/work/uftrace/utils/utils.c:1077: ts == 8959832082682731
ts = 8959832000000000
value_2 = 8959832082682731
[001] parse_timestamp : FAIL
This means parse_timestamp("8959832.082682731")
returns incorrect result. It looks there is an error when handling leading 0
in the timestamp.
125 report_range
also has the same issue as @jonghyeon-an reported.
$ uftrace replay -f +time
# DURATION TID TIMESTAMP FUNCTION
1.534 us [ 50740] 8959832.082671765 | __monstartup();
0.906 us [ 50740] 8959832.082677104 | __cxa_atexit();
[ 50740] 8959832.082678806 | main() {
[ 50740] 8959832.082679093 | a() {
[ 50740] 8959832.082679299 | b() {
[ 50740] 8959832.082679493 | c() {
2.119 us [ 50740] 8959832.082679756 | getpid();
3.238 us [ 50740] 8959832.082682731 | } /* c */
3.877 us [ 50740] 8959832.082683176 | } /* b */
4.332 us [ 50740] 8959832.082683425 | } /* a */
4.830 us [ 50740] 8959832.082683636 | } /* main */
$ uftrace report -F main -r ~8959832.082682731
Total time Self time Calls Function
========== ========== ========== ====================
We have to fix the logic in parse_timestamp
. https://github.com/namhyung/uftrace/blob/4d6fe0d68e47aec5b70d91cf29732e1027817993/utils/utils.c#L529-L564
parse_timestamp
When I checked the code, it seems that the base should be changed from 0 to 10 in the strtoull function. After the change, the test case you shared was passed.
diff --git a/utils/utils.c b/utils/utils.c
index aa0ef2f4..e7f6cb2a 100644
--- a/utils/utils.c
+++ b/utils/utils.c
@@ -532,12 +532,12 @@ uint64_t parse_timestamp(char *arg)
uint64_t ts, tmp;
int len;
tmp = strtoull(arg, &sep, 0);
ts = tmp * NSEC_PER_SEC;
if (*sep == '.') {
arg = sep + 1;
- tmp = strtoull(arg, &sep, 0);
+ tmp = strtoull(arg, &sep, 10);
len = 0;
while (isdigit(*arg)) {
After changes
$ make TESTARG="-v parse_timestamp" unittest
TEST test_unit
Running 1 test cases
======================
Testing parse_timestamp...
[001] parse_timestamp : PASS
-------------
unit test stats
====================
1 ran successfully
0 failed
0 skipped
0 signal caught
0 unknown result
It also passed the 222 and 125 tests several times.
After a few more tests, NG occurs when there are 8 decimal places like 1544510.78979574 in test222. The comparison syntax seems to indicate that 1544510078979574 is correct, not 1544510789795740. When removing the code as below, 8 decimal places pass through the case.
diff --git a/utils/utils.c b/utils/utils.c
index aa0ef2f4..ede7db40 100644
--- a/utils/utils.c
+++ b/utils/utils.c
@@ -537,7 +537,7 @@ uint64_t parse_timestamp(char *arg)
if (*sep == '.') {
arg = sep + 1;
- tmp = strtoull(arg, &sep, 0);
+ tmp = strtoull(arg, &sep, 10);
len = 0;
while (isdigit(*arg)) {
@@ -546,10 +546,10 @@ uint64_t parse_timestamp(char *arg)
}
/* if resolution is lower than nsec */
- while (len < 9) {
- tmp *= 10;
- len++;
- }
+ // while (len < 9) {
+ // tmp *= 10;
+ // len++;
+ // }
/* if resolution is higher than nsec */
while (len > 9) {
Thanks for the analysis. Please fix it with a test case!
I suggest also setting the strtoull base to 10 @ L535, which parses the number before the decimal point, for consistency. Octal in a timestamp doesn't make much sense anyways.
https://github.com/namhyung/uftrace/blob/9b05004b3ede81579b3aaa05008a425e8e287fa0/utils/utils.c#L535-L540
I thought the implementor of the parse_timestamp function coded the argument value of line 535 as 0 to cover multiple bases. Is it right to fix?
Also, in the parse_time function, the base value of strtoull was coded as 0, so it would be nice to discuss whether it is necessary to fix it together.
https://github.com/namhyung/uftrace/blob/9b05004b3ede81579b3aaa05008a425e8e287fa0/utils/utils.c#L473-L484