probe icon indicating copy to clipboard operation
probe copied to clipboard

Probe failure: no input provided

Open lorenzoPrimi opened this issue 4 years ago • 18 comments

We have a lot of probes which fails the WebConnectivity test due to

org.openobservatory.ooniprobe.common.MKException: {"downloaded_kb":0.0,"failure":"no input provided","idx":0,"key":0.0,"percentage":0.0,"uploaded_kb":0.0}

this might be due to the API returning an empty array or some communication problem between the app and the API. We need to investigate this and is high priority due to the increasing number of bug reports.

Ref: https://countly.ooni.io/dashboard#/5ea6cf6ad7b7950499a822db/crashes/65e8257909c0ec7ed2e41684a044f2cc80d4a1ac

lorenzoPrimi avatar Jan 25 '21 11:01 lorenzoPrimi

We are debugging this issue with @FedericoCeratto. It seems that the root cause is a timeout on the probe side when calling the the API. The mobile app does not handle the timeout as an error. The returned list, on timeout, is obviously empty. The empty list triggers the user-visible error.

We have been able to make excellent progress after this first debugging session! We're confident we're gonna fix this soon!

bassosimone avatar Jan 27 '21 18:01 bassosimone

We are going to continue monitoring and testing this fix in Sprint 32.

bassosimone avatar Feb 01 '21 09:02 bassosimone

We are inspecting failures with @lorenzoPrimi. This issue is still present. We need to continue investigating with @FedericoCeratto.

bassosimone avatar Feb 08 '21 11:02 bassosimone

Added to the agenda topics for the weekly backend meeting.

hellais avatar Feb 12 '21 14:02 hellais

Added again to the agenda for the backend meeting.

bassosimone avatar Mar 01 '21 09:03 bassosimone

Changes have been made to the backend in order to log the exception of returning and empty test list and added further logging information, see: https://github.com/ooni/api/pull/231.

@lorenzoPrimi should anything else be done on the mobile app front? Have we added logging on the mobile app as well to record why and how it's failing in this way?

hellais avatar Apr 26 '21 12:04 hellais

We have the logging implemented since months, this is how we discovered this problem

lorenzoPrimi avatar Apr 26 '21 12:04 lorenzoPrimi

Actually we still have about 50 cases per day https://sentry.io/organizations/ooni/issues/2201734929/?project=5619989&query=is%3Aunresolved

lorenzoPrimi avatar May 07 '21 13:05 lorenzoPrimi

Sentry issue: PROBE-ANDROID-8

sentry-io[bot] avatar May 07 '21 13:05 sentry-io[bot]

The number as as low as 10-20 per day, should we close this? @hellais @FedericoCeratto

lorenzoPrimi avatar Jul 20 '21 10:07 lorenzoPrimi

I think we should try to better understand why it's happening, because it's still a non-trivial amount of exceptions in the last period. According to sentry it's 454 crashes in the last 14 days.

hellais avatar Oct 25 '21 12:10 hellais

What we should do is add more logging closer to when the request for obtaining the test list is done: https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/suite/AbstractSuite.java#L140.

If the returned test list is empty we should log it as an error in sentry and attach to the error also the full content of the response from the backend.

hellais avatar Nov 09 '21 16:11 hellais

in row 140 the test list can never be empty. This is not the url list but the test list (web_connectivity, dash etc)

The url list is downloaded here https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L146

But we already log exceptions and show a modal.

https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L165 https://github.com/ooni/probe-android/blob/master/app/src/main/java/org/openobservatory/ooniprobe/test/TestAsyncTask.java#L141

I'm gonna add one more exception when the list returned is empty

lorenzoPrimi avatar Nov 15 '21 13:11 lorenzoPrimi

Pretty sure this is not a timeout but rather an empty array returned by the server

lorenzoPrimi avatar Nov 15 '21 13:11 lorenzoPrimi

So, here's what I think may be happening. We have seen in https://github.com/ooni/probe/issues/1922 that there are issues with the HTTP library the app is using around the connection being closed. This should not be treated as an error, but for some reason it's treated as such. What I think may be happening in the case of this bug is that there is a race between the connection being closed and the right callback "data ready" being called. If the connection being closed wins, what happens is that we have an error instead of a successful result, so we get an empty list. I remember we added support for logging most exceptions (to be double checked), but it may be that here we're actually not witnessing an exception but rather an ordinary errback is called instead of a callback. (This is why exceptions are bad and combining exceptions with callbacks for error handling in networking code is not a good idea in the grand scheme of things.)

Regardless of whether my theory above is correct or not, we should get to the bottom of this. One starting point would actually be to refactor the mobile app to always use the engine for making network I/O.

bassosimone avatar Jan 27 '22 19:01 bassosimone

We have added additional logging to the new release of the app. We are going to analyse the logs and iterate further once we have more information.

hellais avatar Feb 14 '22 09:02 hellais

Has this been fixed? Is this no longer an issue?

hellais avatar Dec 07 '23 09:12 hellais

@aanorbel do you know if this is still an issue?

jbonisteel avatar Aug 05 '24 13:08 jbonisteel