mapbox-navigation-android improve service start

Follow-up from https://github.com/mapbox/mapbox-navigation-android/pull/6405#discussion_r983447779.

Sep 29 '22 15:09 dzinad

To recover, developers would have to call MapboxNavigation#stopTripSession (to reset flags in MapboxTripSession) and then MapboxNavigation#startTripSession again.

I don't think so. All flags in MapboxTripService will be reset after this PR. If you mean MapboxTripSession, it will be in a valid started state just without the service. We support this kind of thing: we have the boolean withForegroundService flag in the API.

I still don't fully understand why would we catch this exception. This is a clear misuse of the Android SDK

When we are talking about the case that started this PR, then yes, it's a misuse of Android SDK (and now our SDK as well). However, I try to be very cautious with the system APIs like startService and many others. They may throw if just "something goes wrong", especially when you consider a variety of devices with different firmwares and custom restrictions. Invocations that make the components interact are much more prone to unexpected failures than other APIs (in case you'd ask why I don't suggest try-catching everything system-related). Moreover, the docs declare that startService method can throw exceptions and they don't give a exhaustive list of cases when that's possible (some of the statements are quite abstract). Not having the service while driving is less of a pain in the neck than the app suddenly crashing leaving you wondering where you should turn next. That's why I'm in favour of try-catching things like that. It's just a suggestion, I can revert it but I'd like this suggestion to be considered.

To truly support this state and allow developers to recover we'd need to add a boolean (or better Expected) return type to MapboxNavigation#startTripSession, and clear the MapboxTripSession flags and state on failure.

As mentioned above, the session will be started, just without the service. If you're sure that's necessary, I can add the return value, just not sure if that would reflect the situation well enough, considering there is a session started.

Oct 03 '22 16:10 dzinad

Not having the service while driving is less of a pain in the neck than the app suddenly crashing leaving you wondering where you should turn next.

I agree with @dzinad

Adding the proposed catch seems to me like making the original problem even tougher to discover for apps that do make a mistake of trying to start the service from the background.

@LukasPaczos, that makes sense. What if we expose "not fatal error" callback for external developers so they can notice that something went wrong in their production app, something like this: https://github.com/mapbox/mapbox-navigation-android/compare/main...vv-on-error-listener. What do you think about it?

Oct 26 '22 07:10 VysotskiVadim

I still don't fully understand why would we catch this exception. This is a clear misuse of the Android SDK

I agree with @LukasPaczos here. We shouldn't catch "startService" exceptions here (especially not all Throwables). Instead, we should log them into the console and re-throw them upwards for the application code to handle.

These exceptions are a clear sign of Android SDK misuse and are likely caused by developer error. Allowing exceptions to crash the system early on can help developers detect those issues early on during development instead of in production.

Dec 09 '22 21:12 tomaszrybakiewicz

@LukasPaczos @tomaszrybakiewicz I've revisited your comments. I understand why you want to crash the app when the developer misuses the Android SDK. And I agree with that. My concern is about cases when everything is used correctly but the system goes nuts. How about the following? We rethrow the exceptions that are thrown for known background cases (IllegalStateException for startService and ForegroundServiceStartNotAllowedException for startForegroundService) but catch everything else? If a user misuses the SDK, they'll get these exceptions. In all other weird cases we don't crash. WDYT?

Jan 04 '23 10:01 dzinad

... How about the following? We rethrow the exceptions that are thrown for known background cases (IllegalStateException for startService and ForegroundServiceStartNotAllowedException for startForegroundService) but catch everything else? If a user misuses the SDK, they'll get these exceptions. In all other weird cases we don't crash. WDYT?

That sounds more reasonable.

Jan 04 '23 16:01 tomaszrybakiewicz

... How about the following? We rethrow the exceptions that are thrown for known background cases (IllegalStateException for startService and ForegroundServiceStartNotAllowedException for startForegroundService) but catch everything else? If a user misuses the SDK, they'll get these exceptions. In all other weird cases we don't crash. WDYT?

I honestly still don't see why we'd want to catch any of the exceptions at this time. In these situations we're either left with a "started" session that has no service, with a developer that thinks a session is started but it actually isn't, or if we add a failure callback with a developer that has no clear way of recovering anyway.

Are there any examples of us not catching an exception being a problem? Beyond a misuse of the Android SDK like ForegroundServiceStartNotAllowedException of course.

Jan 04 '23 16:01 LukasPaczos

In these situations we're either left with a "started" session that has no service, with a developer that thinks a session is started but it actually isn't, or if we add a failure callback with a developer that has no clear way of recovering anyway.

I still don't see why you find this state "bad" or inconsistent. We'll have a perfectly valid session, the only thing that's different is the absence of a foreground service, isn't it? We support this state even through our API (the user can just pass a boolean flag saying they don't want a service). And we'll work and everything will be consistent.

Are there any examples of us not catching an exception being a problem?

I have not encountered any. However, I'm trying to think ahead and I've come to not trust the system especially considering a wide range of available firmwares that may have their own specific bugs. When I was working on an Analytics SDK project that was widely spread across different devices I've seen weird stuff like JSONException did not extend Exception (on some specific devices). That's why I try to be very cautious with system invocations.

Of course we should not sacrifice consistency in our state for an off chance that something like that would happen. But I don't think that's what I'm doing here.

Jan 05 '23 09:01 dzinad

I have not encountered any. However, I'm trying to think ahead and I've come to not trust the system especially considering a wide range of available firmwares that may have their own specific bugs.

I feel that we're trying to solve a problem that doesn't exist and even if it did occur on some specific hardware/firmware combo (or for any other reason) we'd only end up obfuscating the root cause. Following this approach we should be considering try-catching every interaction with an Android SDK or any other third-party library.

We'll have a perfectly valid session, the only thing that's different is the absence of a foreground service, isn't it? We support this state even through our API (the user can just pass a boolean flag saying they don't want a service). And we'll work and everything will be consistent.

We'll have a working state, but it's a state that's unexpected for a developer who requested for something else. They requested for service to be running but it wouldn't.

In my opinion catching and obfuscating errors without recovery path, like a lack of notification and lack of location updates when app is minimized, should only be considered if there's a clear known issue or a reason why it would be necessary. If an issue:

Cannot feasibly be fixed upstream where the root cause is.
Cannot be fixed upstream in a time-frame required.

that's when I think a workaround should be considered.

Defensive programming definitely has its place, for example, if we're looking at a component like network stack we might want to catch all the errors and turn them into failed responses because there's a a potential for mangled data or parsing errors. However, if we're encountering a component that has no known potential issues, I don't see a need to be defensive about it and add that complexity to our code.

Jan 05 '23 10:01 LukasPaczos

Following this approach we should be considering try-catching every interaction with an Android SDK or any other third-party library.

No, not every one, but starting a service is not something trivial. I'm not suggesting using a try-catch for context.resources or sth.

We'll have a working state, but it's a state that's unexpected for a developer who requested for something else. They requested for service to be running but it wouldn't.

Well, yes, but consider a situation a situation where crash does happen. Then the developer asked for a trip session but got his app killed. Also not exactly the thing they asked for.

I think the main problem here is that you think it's never gonna happen. I don't see, however, how it's ideologically different from catching parsing errors. If a user requests a route via our API and there is a parsing error although the request was successful, there's no way of recovery here. Maybe it's some error on our backend or sth but the user won't be able to do anything with it. However, we still use try-catch there.

Example: https://github.com/mapbox/mapbox-navigation-android/blob/1419bb15f2913cb7582191918cb7c87327bcd045/libnavigation-base/src/main/java/com/mapbox/navigation/base/internal/utils/DirectionsResponseUtils.kt#L34

Other examples of defensive try-catches: https://github.com/mapbox/mapbox-navigation-android/blob/1419bb15f2913cb7582191918cb7c87327bcd045/libnavui-dropin/src/main/java/com/mapbox/navigation/dropin/util/BitmapMemoryCache.kt#L30 https://github.com/mapbox/mapbox-navigation-android/blob/1419bb15f2913cb7582191918cb7c87327bcd045/libnavigation-util/src/main/java/com/mapbox/navigation/utils/internal/ThreadController.kt#L26

But. It occurred to me that if there is a crash during startTripSession it will be before the user starts driving, not while. So it won't be that much of a problem. I mean if the user was driving when the crash occurred, that would be highly undesirable. So I won't be 100% against leaving it as-is (just maybe 90%. :))

Jan 06 '23 12:01 dzinad

Thanks for dropping the examples, I think they somewhat help highlight the point I'm trying to make. When it comes to HTTP responses or image processing (where image is externally provided) we're dealing with a nondeterministic input so we're guarding ourselves against the odds of failure which we know exist (either on the transport layer with corruption, or live environment that keeps changing and can return incorrect values, etc). When it comes to the channel try-catch, the ReceiveChannel#receive clearly states that it can throw and we're handling that correctly, our catch might be too broad but it's not there without a reason. We're writing a defensive code against components that have known or otherwise documented potential of failure.

In the case the PR discusses, there's no potential/known or documented failure that we should guard against beyond a clear and obvious developer error. And if there was a situation that results in a crash it might be better to surface it early and resolve the root cause rather than keeping it obfuscated. If then we found that issue to be a platform-level issue without a timely resolution possibility or client-side workaround, we could consider a patch with a workaround baked into the SDK depending on severity of the issue.

With the discussion above I'm trying to express an opinion that writing defensive code without a reason might not always be the best choice as it can obfuscate other bugs, especially if there's no runtime channel to detect these caught bugs beyond logs. I'm making an example out of this simple PR but my reasoning can be applied to anything else. Imagine a bigger chunk of code being caught like this without a clear an obvious reason (and with a broad catch statement) - if we made any mistake in that guarded chunk of code and introduced a bug, it could go through our testing pipeline undetected because it would never throw.

Jan 11 '23 13:01 LukasPaczos

With the discussion above I'm trying to express an opinion that writing defensive code without a reason might not always be the best choice as it can obfuscate other bugs.

I absolutely agree here. But it's also important that the app does not crash while driving. If it does, it can create dangerous situations on the road. However, as mentioned above, in this case, if it crashes, it will be before driving, not during. For this reason, and considering the fact that you are strictly against defensive approach here, let's close the PR.

Jan 16 '23 14:01 dzinad

mapbox-navigation-android mapbox-navigation-android copied to clipboard

improve service start

mapbox-navigation-android
mapbox-navigation-android copied to clipboard