probe icon indicating copy to clipboard operation
probe copied to clipboard

all: incorrect progress estimates in percents in log

Open kotenok2000 opened this issue 2 years ago • 8 comments

Describe the bug

Progress estimates in percents are incorrect To Reproduce

Steps to reproduce the behavior.

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

Running circumvention tests iplookup: using avast sessionresolver: https://dns.google/dns-query... sessionresolver: https://dns.google/dns-query... sessionresolver: https://dns.google/dns-query... session: using probe services: {Address:https://ps1.ooni.io Type:https Front:} 0.0016666666666666668% - psiphon experiment running 0.0033333333333333335% - psiphon experiment running 0.005% - psiphon experiment running 0.006666666666666667% - psiphon experiment running 0.008333333333333333% - psiphon experiment running 0.01% - psiphon experiment running 0.011666666666666667% - psiphon experiment running 0.013333333333333334% - psiphon experiment running 0.015000000000000001% - psiphon experiment running 0.016666666666666666% - psiphon experiment running 0.018333333333333333% - psiphon experiment running 0.02% - psiphon experiment running 0.021666666666666667% - psiphon experiment running 0.023333333333333334% - psiphon experiment running 0.025% - psiphon experiment running 0.02666666666666667% - psiphon experiment running 0.028333333333333335% - psiphon experiment running 0.030000000000000002% - psiphon experiment running 0.03166666666666667% - psiphon experiment running 0.03333333333333333% - psiphon experiment running 0.034999999999999996% - psiphon experiment running 0.03666666666666666% - psiphon experiment running 0.03833333333333332% - psiphon experiment running 0.03999999999999999% - psiphon experiment running 0.04166666666666665% - psiphon experiment running 0.043333333333333314% - psiphon experiment running 0.04499999999999998% - psiphon experiment running 0.04666666666666664% - psiphon experiment running 0.048333333333333305% - psiphon experiment running 0.04999999999999997% - psiphon experiment running 0.05166666666666663% - psiphon experiment running 0.053333333333333295% - psiphon experiment running 0.05499999999999996% - psiphon experiment running 0.05666666666666662% - psiphon experiment running 0.058333333333333286% - psiphon experiment running 0.05999999999999995% - psiphon experiment running 0.06166666666666661% - psiphon experiment running 0.06333333333333328% - psiphon experiment running 0.06499999999999995% - psiphon experiment running 0.06666666666666661% - psiphon experiment running 0.06833333333333327% - psiphon experiment running 0.06999999999999994% - psiphon experiment running 0.0716666666666666% - psiphon experiment running 0.07333333333333326% - psiphon experiment running 0.07499999999999993% - psiphon experiment running 0.07666666666666659% - psiphon experiment running 0.07833333333333325% - psiphon experiment running 0.07999999999999992% - psiphon experiment running 0.08166666666666658% - psiphon experiment running 0.08333333333333325% - psiphon experiment running 0.08499999999999991% - psiphon experiment running 0.08666666666666657% - psiphon experiment running 0.08833333333333324% - psiphon experiment running 0.0899999999999999% - psiphon experiment running 0.09166666666666656% - psiphon experiment running 0.09333333333333323% - psiphon experiment running 0.09499999999999989% - psiphon experiment running 0.09666666666666655% - psiphon experiment running 0.09833333333333322% - psiphon experiment running 0.09999999999999988% - psiphon experiment running 0.10166666666666654% - psiphon experiment running 0.10333333333333321% - psiphon experiment running 0.10499999999999987% - psiphon experiment running 0.10666666666666653% - psiphon experiment running 0.1083333333333332% - psiphon experiment running 0.5% - psiphon experiment complete TCPConnect 199.58.81.140:443... in progress 0.5142857142857142% - tor: access 66.111.2.131:9030/dir_port: generic_timeout_error TCPConnect 199.58.81.140:443... generic_timeout_error 0.5285714285714286% - tor: access 199.58.81.140:443/or_port_dirauth: generic_timeout_error TCPConnect 131.188.40.189:443... in progress TCPConnect 86.59.21.38:443... in progress TCPConnect 86.59.21.38:443... generic_timeout_error 0.5428571428571428% - tor: access 86.59.21.38:443/or_port_dirauth: generic_timeout_error TCPConnect 131.188.40.189:443... generic_timeout_error 0.5571428571428572% - tor: access 131.188.40.189:443/or_port_dirauth: generic_timeout_error TCPConnect 193.23.244.244:443... in progress TCPConnect 193.23.244.244:443... generic_timeout_error 0.5714285714285714% - tor: access 193.23.244.244:443/or_port_dirauth: generic_timeout_error 0.5857142857142857% - tor: access 193.23.244.244:80/dir_port: generic_timeout_error TCPConnect 209.148.46.65:443... in progress TCPConnect 209.148.46.65:443... generic_timeout_error 0.6% - tor: access 209.148.46.65:443/obfs4: generic_timeout_error 0.6142857142857143% - tor: access 154.35.175.225:80/dir_port: generic_timeout_error TCPConnect 51.222.13.177:80... in progress TCPConnect 66.111.2.131:9001... in progress TCPConnect 66.111.2.131:9001... generic_timeout_error 0.6285714285714286% - tor: access 66.111.2.131:9001/or_port_dirauth: generic_timeout_error TCPConnect 51.222.13.177:80... generic_timeout_error 0.6428571428571428% - tor: access 51.222.13.177:80/obfs4: generic_timeout_error TCPConnect 85.31.186.26:443... in progress TCPConnect 193.11.166.194:27015... in progress TCPConnect 193.11.166.194:27015... generic_timeout_error TCPConnect 85.31.186.26:443... generic_timeout_error 0.6571428571428571% - tor: access 193.11.166.194:27015/obfs4: generic_timeout_error 0.6714285714285715% - tor: access 85.31.186.26:443/obfs4: generic_timeout_error TCPConnect 45.145.95.6:27015... in progress TCPConnect 128.31.0.39:9101... in progress TCPConnect 45.145.95.6:27015... generic_timeout_error 0.6857142857142857% - tor: access 45.145.95.6:27015/obfs4: generic_timeout_error TCPConnect 128.31.0.39:9101... generic_timeout_error 0.7% - tor: access 128.31.0.39:9101/or_port_dirauth: generic_timeout_error 0.7142857142857143% - tor: access 131.188.40.189:80/dir_port: generic_timeout_error 0.7285714285714285% - tor: access 128.31.0.39:9131/dir_port: generic_timeout_error TCPConnect 192.95.36.142:443... in progress 0.7428571428571429% - tor: access 199.58.81.140:80/dir_port: generic_timeout_error TCPConnect 192.95.36.142:443... generic_timeout_error 0.7571428571428571% - tor: access 192.95.36.142:443/obfs4: generic_timeout_error TCPConnect 171.25.193.9:80... in progress 0.7714285714285714% - tor: access 86.59.21.38:80/dir_port: generic_timeout_error TCPConnect 171.25.193.9:80... generic_timeout_error 0.7857142857142857% - tor: access 171.25.193.9:80/or_port_dirauth: generic_timeout_error TCPConnect 144.217.20.138:80... in progress TCPConnect 193.11.166.194:27020... in progress System information (please complete the following information):

  • Device: pc
  • OS: windows 11
  • OONI Probe version: 3.7 Additional context

Add any other context about the problem here.

kotenok2000 avatar Mar 18 '22 16:03 kotenok2000

Logs depicting behavior.

2022-05-30 19:42:31.517619+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "processing https://doh-fi.blahdns.com/dns-query";
        percentage = "0.7580645161290323";
    };
}
2022-05-30 19:42:33.057562+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure https://95.216.212.177/dns-query: ok";
        percentage = "0.7";
    };
}

........

2022-05-30 19:42:40.358453+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "processing dot://doh-fi.blahdns.com/dns-query";
        percentage = "0.7629032258064516";
    };
}
2022-05-30 19:42:41.968269+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure dot://95.216.212.177/dns-query: ok";
        percentage = "0.7";
    };
}

........

2022-05-30 19:50:19.040735+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "processing https://dnsnl-noads.alekberg.net/dns-query";
        percentage = "0.9854838709677419";
    };
}
2022-05-30 19:50:22.647250+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure https://51.15.124.208/dns-query: ok";
        percentage = "0.7";
    };
}

........

2022-05-30 19:50:29.237998+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "processing https://doh.tiarap.org/dns-query";
        percentage = "0.9903225806451613";
    };
}
2022-05-30 19:50:32.285116+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure https://104.21.30.162/dns-query: ok";
        percentage = "0.55";
    };
}
2022-05-30 19:50:32.632051+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure https://172.67.173.59/dns-query: ok";
        percentage = "0.7";
    };
}
2022-05-30 19:50:37.963476+0100 ooniprobe[7108:130841] Got event: {
    key = "status.progress";
    value =     {
        message = "dnscheck: measure https://[2606:4700:3030::ac43:ad3b]/dns-query: network_unreachable";
        percentage = "0.85";
    };
}

aanorbel avatar May 31 '22 03:05 aanorbel

It seems it's better to systematically test all platforms and check their behavior. It seems not all platforms behave in the same way and this is because we have different code for handling this kind of events.

product group monotonic progress
cli websites ✔️
cli im ✔️
cli performance ✔️
cli circumvention ✔️
cli middlebox ✔️
cli experimental ✔️
android websites ✔️
android im ✔️
android circumvention ✔️
android performance ✔️
android experimental ❌ stunreachability
ios websites ✔️
ios im ✔️
ios circumvention ❌ tor
ios performance ❌ seems completely broken
ios experimental
deskop websites ✔️
desktop im ✔️
desktop circumvention ✔️
desktop performance ✔️
desktop middleboxes ✔️
desktop experimental ✔️

Table 1. Checking whether progress is monotonic by platform

Once I have finished testing all platforms and have completed the table, I can then understand the real issue and then plan on how to fix it. The https://github.com/ooni/probe-cli/pull/785 diff apparently fixed dnscheck for cli and android.

So, the bottom line here seems that we have issues with mobile only. That's certainly a bit surprising but it may be this is happening because we're using a different codepath for running mobile experiments that mimics the old MK API.

bassosimone avatar Jun 03 '22 10:06 bassosimone

Having implemented fixes, it's now time to test again!

product group monotonic progress
cli websites ✔️
cli im ✔️
cli performance ✔️
cli circumvention ✔️
cli middlebox ✔️
cli experimental ✔️
android websites ✔️
android im ✔️
android circumvention ✔️
android performance ✔️
android experimental ✔️
ios websites N/A
ios im N/A
ios circumvention N/A
ios performance N/A
ios experimental N/A
desktop websites ✔️
desktop im ✔️
desktop circumvention ✔️
desktop performance ✔️
desktop middleboxes ✔️
desktop experimental ✔️

I cannot test iOS without releasing, but at least it seems all other platforms have been fixed. I'll bless a new release and then test whether the issue still persist with iOS.

bassosimone avatar Jun 07 '22 10:06 bassosimone

Release v3.15.1, now let's test iOS:

product group monotonic progress
ios websites ✔️
ios im ❌ jumps back and forth (why?)
ios circumvention ❌ jumps back and forth (why?)
ios performance ❌ jumps back and forth (why?)
ios experimental N/A

I don't know what to do 😕. I have confirmed these tests are using ooni/probe-cli v3.15.1, which has fixed the issue completely for Android. Since the engine's codebase is the same, I think the problem is outside my domain. I will ask @aanorbel to run tests with Android and iOS version to confirm he also sees the progress bar jumping on iOS and everything seems WAI for Android.

bassosimone avatar Jun 07 '22 13:06 bassosimone

I feel like a did what was possible for me to do to improve the situation for v3.15.x. We'll see what is the root cause on iOS and perhaps we need to apply further changes to ooni/probe-cli (but I am not sure). So, I am going to move this issue to v3.16.x for now.

bassosimone avatar Jun 07 '22 13:06 bassosimone

We can get similar results on the iOS application as with the Android application without much change. I am looking into the iOS implementations right now.

aanorbel avatar Jun 07 '22 13:06 aanorbel

So, to summarize, @aanorbel confirmed to me that we have fixed Android. We still have issues with iOS and it is yet unclear to us why we have such issues. On a positive note, iOS is the only product we still need to fix.

bassosimone avatar Jun 15 '22 12:06 bassosimone

I am going to add @aanorbel as an assignee so that he can investigate the iOS related aspects of this issue.

bassosimone avatar Jul 25 '22 12:07 bassosimone

@bassosimone , can we revisit this issue to determine the current state and do something to close the issue?

aanorbel avatar Apr 11 '23 08:04 aanorbel

@aanorbel the progress bar is still misbehaving on iOS as I am about to document on https://github.com/ooni/probe/issues/2509. I'll send you a video in private that shows the progress bar misbehaving.

bassosimone avatar Jul 25 '23 10:07 bassosimone