mqttclpro icon indicating copy to clipboard operation
mqttclpro copied to clipboard

frequent disconnects

Open brianmay opened this issue 7 years ago • 63 comments

Hello,

As much as a like this, it seems to disconnect frequently for no apparent reason. Still trying to work out why. I have tried changing the keep alive from 1200 seconds to 60, but it doesn't appear to have helped.

So far if I try to reproduce the problem on demand, it works perfectly :-(

When it does disconnect, it doesn't appear to retry to connect unless the screen is on. Although that could also be coincidence too.

This is on Android 7.1.2 connecting to mosquitto 1.3.4.

Regards

brianmay avatar Aug 26 '17 01:08 brianmay

"Waiting for network connection." seems to be common when the connection goes down (without reason). The retry time is set to 100 seconds, and sometimes there doesn't appear to be any attempt to reconnect after this time elapses.

brianmay avatar Aug 26 '17 02:08 brianmay

I encountered the same issue, and am still trying to pinpoint the problem as well. My context:

  • Using Android 7.0, Tasker v5.0u4, Mqttclpro v2.2.2 + 2 Samsung Galaxy S6 phones clientside;
  • Broker is Mosquitto v1.4.14 on Raspberry Pi, all settings default;
  • Mqttclpro keepalive was lowered to 300, Retry interval is 100, with user/pass only (no SSL). No clean session;
  • Android power management is disabled for Tasker and Mqttclpro, including background services;
  • Android set to Wifi: Always on, with no data limitation (one phone still disconnects wifi sometimes when locked/sleeping, other phone doesn't);
  • I frequently change network (from Wifi to VPN), but it happens on the same local network as well. VPN is IPv4 only, all other connections IPv6;
  • Subscriptions and publishing are impacted, the latter shows "Unable to publish as we are not connected";
  • A different Tasker plugin MQTT Publish Plugin does publishing only, but does not seem to have this issue (perhaps their code/approach helps as an example?);
  • Waking up Android shows "Connecting..." first and then "Connected", this often restores the connection correctly (more testing needed);

Planning to dive into the serverside Mosquitto settings next, but I have a feeling the source of the issue lies within Mqttclpro.

acefed avatar Aug 26 '17 10:08 acefed

Might be related to #14. Similar to what is discussed in this bug, it often doesn't seem to retry reconnecting when it says it will - or at least the display keeps saying it will retry at time X after time X has elapsed. When it reconnects correctly, I might be able to forgive it for disconnecting in the first place :-).

Possibly sometime disconnects are due to poor wifi reception when I am moving around, however, I have also had disconnects when the phone is sitting pretty much right next to the (very lowly loaded) wifi router, and there is no evidence that the wifi connection is being dropped (probably should double check this).

I am inclined to feel that I need run tcpdump on the connection, and see if that gives away any secrets. Will do so when time permits.

brianmay avatar Aug 28 '17 02:08 brianmay

I sometimes see "Connection lost - no network connection". Looking at the source this happens if isOnline() returns False and the ConnectionLost function is called.

https://github.com/dc297/mqttclpro/blob/01965085ad328d2f7983186e1f3a3adda2e01cb4/app/src/main/java/in/dc297/mqttclpro/MQTTService.java#L1089

So either the network never went down and this function sometimes returns a bad value, or the network did go down for a moment (which I doubt), resulting in the connection being lost, and it hasn't retried to connect.

brianmay avatar Aug 28 '17 06:08 brianmay

The "Connection lost - no network connection" message is in the ConnectionLost function. Trying to work out what calls this? Maybe Android? https://github.com/dc297/mqttclpro/blob/01965085ad328d2f7983186e1f3a3adda2e01cb4/app/src/main/java/in/dc297/mqttclpro/MQTTService.java#L101

brianmay avatar Aug 28 '17 06:08 brianmay

Ok, so it seems that Android really is shutting the network down after the screen is off for about 4 to 6 minutes. This is despite having "Keep Wi-Fi on during sleep." set to always. So the disconnects aren't a problem in this project, the problem here is that the reconnects don't always work reliably.

brianmay avatar Aug 29 '17 00:08 brianmay

Looks like pinging an Android device isn't a good way of determining if the wifi is awake or not. This was confusing me into thinking that nothing I did was working.

I have found that if I set this application to be "Not optimized" in the battery optimization settings, it won't disconnect. Connection appears to be OK, even though the device stops responding to pings. https://support.google.com/pixelphone/answer/7015477?hl=en

There is probably another application that I should set to "Not optimized" to get pings to work.

brianmay avatar Aug 29 '17 03:08 brianmay

Android disabling wifi might be a cause for network interruption, but the interrupted MQTT connection also happens to me when I stay connected via mobile data (OpenVPN), with no wifi activity at all. Today I saw "Unable to publish as we are not connected" while surfing the interwebs. In other words: With a working data connection. MQTT Client reconnects again when I open the app.

If you're digging into wifi connection details, logging the stats (writing them to a file) with Tasker for a period of time might help.

acefed avatar Aug 29 '17 19:08 acefed

Yes, there is still a problem here. Even if Android is disconnecting the network, it should still be possible to reconnect again after Android wakes up. As demonstrated by other programs that do reconnect (and hide the fact they were disconnected...). I suspect there is something wrong in the reconnect code.

brianmay avatar Aug 29 '17 21:08 brianmay

I notice that there are a few logging calls in the source, the best way forward would probably be to connect the phone to something like "adb logcat" to display log messages and try to reproduce the problem. tcpdump information could be useful too. There are also android apps to display logging but looks like these require Android be rooted - something I have not got around to doing. Reproducing the problem might be difficult, I have noticed that sometimes it reconnects perfectly.

brianmay avatar Aug 30 '17 05:08 brianmay

Hey Brian

I have also started monitoring the issue closely. According to my research/knowledge, the issue is with Android's AlarmManager not notifying the app at the right time or not notifying the app at all.

Android generally skips sending the alarm to the app at the right time while the app is in background in order to save battery. Once an alarm is skipped, the app won't reconnect automatically. Maybe we need to find an alternative to Android's AlarmManager

deepessh avatar Aug 30 '17 16:08 deepessh

Ok, that makes sense... I was beginning to suspect something to do with the AlarmManager.

It should reconnect also when the connection comes up. Although it is possible that when it does try connecting there isn't a good network signal (I don't have any evidence here) causing it to fail, and it falls back to the faulty AlarmManager.

brianmay avatar Aug 31 '17 08:08 brianmay

A faulty alarm manager sounds like a plausible cause for subscription/connection problems, but less so for the message "Unable to publish as we are not connected", when there is a working data connection active, handling browser traffic correctly at the same time. But I'm not an Android developer, so I could be completely wrong here.

I noticed other apps frequently using a visible notification icon, to keep processes alive (running in the foreground) on Android, as a workaround for the OS killing off background services when resources are needed. For example, see Tasker > Preferences > Monitor tab > Run in Foreground / Show Notofication Icon > Help

acefed avatar Sep 01 '17 23:09 acefed

Hey I have pushed an update to the App. Now the app doesn't use AlarmManager any more, I have changed it to Handler. I have been testing it for the last 2 days and been working fine for me.

Do check it out and let me know how it works for you.

Cheers! Deepesh

deepessh avatar Sep 02 '17 11:09 deepessh

Thanks! I have updated, testing in progress...

acefed avatar Sep 02 '17 23:09 acefed

Unfortunately, when I have an active wifi connection but my screen is locked for some time, a MQTT subscription does not trigger an action in Tasker yet, when a message is sent. If I wake up the phone and try again, the action triggers correctly (without opening Tasker or MQTT Client).

So reconnection on a wake up works, but I never had a problem with that to begin with. In my case, reconnection on wake up always worked correctly.

Will test some more, to get some insight in the "Unable to publish as we are not connected" issue.

acefed avatar Sep 03 '17 20:09 acefed

Hey have you disabled battery optimization for the app?

deepessh avatar Sep 03 '17 20:09 deepessh

Yes, see above:

  • Android power management is disabled for Tasker and Mqttclpro, including background services;
  • Android set to Wifi: Always on, with no data limitation (one phone still disconnects wifi sometimes when locked/sleeping, other phone doesn't);

I tested on the phone which never disconnects wifi.

acefed avatar Sep 03 '17 20:09 acefed

Could it be that Samsung has implemented some aggressive battery saving features which are denying Internet access to the app? Can you check logcat if the app is getting disconnected or not? Or if the app is receiving tasker actions or not.

deepessh avatar Sep 04 '17 11:09 deepessh

That's certainly a possibility I can't rule out (yet). I use a local Mosquitto broker on my LAN, so when on wifi at home, no internet access is required. Other connections all go through OpenVPN, which could also be a source of errors. I haven't used logcat yet, but can look into that in a few days.

I'm also curious if @brianmay noticed an improvement in v2.3.0+. If so, chances are that my problem lies elsewhere, outside your app.

acefed avatar Sep 04 '17 14:09 acefed

I have noticed several times events fail to get through first thing in the morning.... But this also coincides with an update of the app during the night. It wouldn't surprise me if Android has a limitation where you can't automatically restart a service after an update. Perhaps not worth even mentioning here, however I decided to say so just in case other people have encountered the same issue.

Apart from that I have only had one disconnect issue since I turned off battery optimisation. So it might be a matter of me turning on battery optimisation again to test the reconnect works :-).

brianmay avatar Sep 04 '17 21:09 brianmay

Should also mention that I use OpenVPN too, also for when not at home. No problems so far. Note I did turn off battery optimisation for OpenVPN too.

brianmay avatar Sep 05 '17 07:09 brianmay

Good point, I did that too (some time ago, when I switched from OpenVPN Connect to OpenVPN for Android). Same goes for OpenVpn Tasker Plugin, just to be sure.

acefed avatar Sep 06 '17 00:09 acefed

Haven't had time to look into this more thoroughly... still seeing "Unable to publish", but less so when I expect them. Sometimes as a surprise when I don't expect them. Very weird.

Without a real clue, my uneducated guess blames Android or Samsung software, fiddling with processes which don't use an active icon in the task bar.

acefed avatar Oct 06 '17 23:10 acefed

Yeah, I think having an icon in the notification bar is the way forward. I think I'll add a persistent notification.

deepessh avatar Oct 07 '17 07:10 deepessh

Hey guys..do we have any new observations on this after the new update?

deepessh avatar Nov 16 '17 09:11 deepessh

It's still happening due I believe to my Samsung S7's aggressive power saving. I think the permanent task bar notification is the way to go as demonstrated by other apps e.g. Tasker itself

Bra1nK avatar Nov 16 '17 10:11 Bra1nK

I monitor mine by using the Last Will feature to post the message "Lost" to a "Connection" topic

Bra1nK avatar Nov 16 '17 10:11 Bra1nK

I notice that Android 8.0.0 lists tasks running in the background. On my phone Tasker, CSipSimple and LastPass are listed. But mqttclpro is omitted from this list. I suspect mqttclpro needs to behave like the other background tasks, but I have no idea how this is done. Maybe this is the permanent task bar notification, not sure.

brianmay avatar Nov 16 '17 10:11 brianmay

I didn't want to introduce an annoying notification. But I think we don't have a choice. Let me see what's feasible.

deepessh avatar Nov 16 '17 12:11 deepessh