envoy-mobile icon indicating copy to clipboard operation
envoy-mobile copied to clipboard

Cronvoy: Map Envoy Mobile errors to Cronet API errors

Open carloseltuerto opened this issue 3 years ago • 10 comments

The Cronet API exposes an official list of error codes that must have a one-to-one correspondence with Envoy Mobile errors. These errors also have a one-to-one relation with a Cronet Internal Error Code, except for the last one.

  // Error code indicating the host being sent the request could not be resolved to an IP address.
  ERROR_HOSTNAME_NOT_RESOLVED = 1;      // Internal error: NAME_NOT_RESOLVED

  // Error code indicating the device was not connected to any network.
  ERROR_INTERNET_DISCONNECTED = 2;      // Internal error: INTERNET_DISCONNECTED

  // Error code indicating that as the request was processed the network configuration changed.
  // When {@link #getErrorCode} returns this code, this exception may be cast to a
  // {@link QuicException}.
  ERROR_NETWORK_CHANGED = 3;            // Internal error: NETWORK_CHANGED

  // Error code indicating a timeout expired. Timeouts expiring while attempting to connect will be
  // reported as the more specific {@link #ERROR_CONNECTION_TIMED_OUT}.
  ERROR_TIMED_OUT = 4;                  // Internal error: TIMED_OUT

  // Error code indicating the connection was closed unexpectedly.
  ERROR_CONNECTION_CLOSED = 5;          // Internal error: CONNECTION_CLOSED

  // Error code indicating the connection attempt timed out.
  ERROR_CONNECTION_TIMED_OUT = 6;       // Internal error: CONNECTION_TIMED_OUT

  // Error code indicating the connection attempt was refused.
  ERROR_CONNECTION_REFUSED = 7;         // Internal error: CONNECTION_REFUSED

  // Error code indicating the connection was unexpectedly reset.
  ERROR_CONNECTION_RESET = 8;           // Internal error: CONNECTION_RESET

  // Error code indicating the IP address being contacted is unreachable, meaning there is no route
  // to the specified host or network.
  ERROR_ADDRESS_UNREACHABLE = 9;        // Internal error: ADDRESS_UNREACHABLE

  // Error code indicating an error related to the <a href="https://www.chromium.org/quic">QUIC</a>
  // protocol. When {@link #getErrorCode} returns this code, this exception can be cast to {@link
  // QuicException} for more information.
  ERROR_QUIC_PROTOCOL_FAILED = 10;      // Internal error: QUIC_PROTOCOL_ERROR

  // Error code indicating another type of error was encountered. {@link
  // #getCronetInternalErrorCode} can be consulted to get a more specific cause.
  ERROR_OTHER = 11;                     // Catch all for all other internal errors.

Requires changes in Envoy.

carloseltuerto avatar Jul 15 '21 12:07 carloseltuerto

@carloseltuerto is this issue not a duplicate of #1550 ?

DavidSchinazi avatar Jul 19 '21 21:07 DavidSchinazi

They overlap, but this issue is more important: those are the Public error codes - there is no way around to have them all implemented.

Issue 1550 exposes all the internal codes that are presumably needed - that includes these plus other that are driving some business logic in the Java Cronet, and that are used in some tests, often to test a regression.

carloseltuerto avatar Jul 19 '21 21:07 carloseltuerto

I don't think "there is no way around to have them all implemented" is correct, we can return a subset of error codes right?

DavidSchinazi avatar Jul 19 '21 21:07 DavidSchinazi

The best deal is to implement the public ones, plus the few extra ones in Issue 1550. We don't need to match all of the 235 internal codes :-)

carloseltuerto avatar Jul 19 '21 22:07 carloseltuerto

My point was that we can get away with not implementing some of the public ones: if we return one error in some scenarios instead of another it shouldn't be a huge compatibility problem

DavidSchinazi avatar Jul 19 '21 22:07 DavidSchinazi

Well, for sure, one test won't pass. It injects an error condition, and the correct public error code must be returned:

https://github.com/envoyproxy/envoy-mobile/blob/main/test/java/org/chromium/net/CronetUrlRequestTest.java#L2042

carloseltuerto avatar Jul 19 '21 22:07 carloseltuerto

OK, I'm working on a PR to wire up the error flags. Once we do that I think we can track most of these.

@RyanTheOptimist I'd like your thoughts on ERROR_INTERNET_DISCONNECTED / ERROR_NETWORK_CHANGED, if we can detect these from Java because I don't think we can in C++ yet. Also I'm not sure what ERROR_QUIC_PROTOCOL_FAILED maps to.

We don't differentiate upstream connection failures in Envoy so I think for now we just pick one chrome error code and use it for all 3 errors.

ERROR_HOSTNAME_NOT_RESOLVED DnsResolutionFailed once https://github.com/envoyproxy/envoy/pull/19588/files lands ERROR_INTERNET_DISCONNECTED = 2;
ERROR_NETWORK_CHANGED = 3;
ERROR_TIMED_OUT = StreamIdleTimeout or DurationTimeout ERROR_CONNECTION_CLOSED => UpstreamConnectionTermination ERROR_CONNECTION_TIMED_OUT = n/a (falls under UpstreamConnectionFailure) ERROR_CONNECTION_REFUSED = UpstreamConnectionFailure (refused == unreachable)
ERROR_CONNECTION_RESET UpstreamRemoteReset
ERROR_ADDRESS_UNREACHABLE = UpstreamConnectionFailure (refused == unreachable) ERROR_QUIC_PROTOCOL_FAILED = 10;
ERROR_OTHER = (switch statement default)

alyssawilk avatar Jan 20 '22 14:01 alyssawilk

For INTERNET_DISCONNECTED and NETWORK_CHANGED, Chrome gets this info from the Network Change Notifier, which hook into Android notifications via the Java API https://developer.android.com/reference/android/net/ConnectivityManager.NetworkCallback so Envoy Mobile Java should be able to do something similar. And at some point we'll need to wire these signals down to Envoy.

ERROR_QUIC_PROTOCOL_FAILED is the Cronet layer exposing the network error code ERR_QUIC_PROTOCOL_ERROR: https://source.chromium.org/chromium/chromium/src/+/main:components/cronet/native/url_request.cc;l=122?q=ERROR_QUIC_PROTOCOL_FAILED&ss=chromium Basically this is what most errors map to when QUIC is used as the transport. That being said, this bucket is way too broadly applied in Chrome but we've never fixed this.

I'm slightly surprised that Envoy does not differentiate between refused and unreachable, but c'est la vie. Other than debugging, I doubt this difference is important.

RyanTheOptimist avatar Jan 20 '22 15:01 RyanTheOptimist

For ERROR_QUIC_PROTOCOL_FAILED would it make sense to either convert known errors (connection failure etc) to ERROR_QUIC_PROTOCOL_FAILED when we've negotiated HTTP/3, or use it instead of ERROR_OTHER when HTTP/3 is in use?

alyssawilk avatar Jan 20 '22 15:01 alyssawilk

Converting known errors to ERROR_QUIC_PROTOCOL_FAILED when HTTP/3 was negotiated is probably most similar to Chrome. On the otherhand using ERROR_QUIC_PROTOCOL_FAILED instead of ERROR_OTHER is probably more accurate. I'd be inclined to either do the latter or just skip ERROR_QUIC_PROTOCOL_FAILED completely. Your first suggestion might be safest, if anyone has logic wired up to that error code, though I'm a bit skeptical that anyone does.

RyanTheOptimist avatar Jan 20 '22 15:01 RyanTheOptimist

For ERROR_QUIC_PROTOCOL_FAILED would it make sense to either convert known errors (connection failure etc) to ERROR_QUIC_PROTOCOL_FAILED when we've negotiated HTTP/3, or use it instead of ERROR_OTHER when HTTP/3 is in use?

@alyssawilk I need a flag in finalStreamIntel that would tell me if HTTP/3 was used or not. I plan to reverse map the responseFlags to the quic errors in java when quic is used, but there's no way to know quic was used when the onError callback is called.

colibie avatar Oct 18 '22 13:10 colibie

@colibie SG, on it.

alyssawilk avatar Oct 18 '22 15:10 alyssawilk

And done, thanks for your patience :-)

alyssawilk avatar Oct 20 '22 16:10 alyssawilk

@carloseltuerto you may close this issue. The last commit mapped all the errors.

colibie avatar Nov 03 '22 10:11 colibie

Wohoo! Thanks so much for all your work on this!

alyssawilk avatar Nov 03 '22 14:11 alyssawilk