micropython-lib icon indicating copy to clipboard operation
micropython-lib copied to clipboard

umqtt.robust: Resubscribe to topics after doing reconnect.

Open dpgeorge opened this issue 7 years ago • 26 comments

To address #102: if the broker is restarted then when the client reconnects it needs to resubscribe to previously-subscribed topics in order for them to be delivered.

Other MQTT clients seem to have similar behaviour, including mosquitto_sub.

Tested with mosquitto, by subscribing from a couple of clients, killing and restarting the broker, then publishing to the original topic.

dpgeorge avatar Jun 02 '17 05:06 dpgeorge

Before applying such patch, it would make sense to think why the existing implementation doesn't include it, while was signed off as working. The obvious explanation is that "there's no memory" device can't keep a list of subscriptions (there's no memory), while "give me few more gigs" server can, should, and was tested to have do that (with suitable config).

@dpgeorge , please find the original ticket which lists required test setup for Mosquitto. If the explicit requirements for MQTT server aren't listed in umqtt documentation, that's a different matter, though it's implied that "there should be compliant MQTT server". If users use MQTT servers without full implementation of MQTT, that's again a different matter, outside the scope of umqtt.robust, whose purpose if to provide small, efficient MQTT client, utilizing the MQTT functionality offered by the standard.

pfalcon avatar Jun 02 '17 09:06 pfalcon

The point is that already there are multiple users who encounter this issue, and who expect robust to be "robust". I don't see any point in making another version, eg umqtt.robust_resub, and the whole aim (?) of umqtt.robust was so that it could have these kind of features without putting a burden on umqtt.simple.

dpgeorge avatar Jun 02 '17 10:06 dpgeorge

there should be compliant MQTT server

Does "compliant" mean that the broker should restore all subscription state if it is killed and restarted?

dpgeorge avatar Jun 02 '17 10:06 dpgeorge

MQTT standard says there's a concept of "session", and it's recallable between client connections. I'm a bit busy these days (weeks), so please check whether standard MQTT also includes a clause like "the sessions described above is a joke, you can't really rely that they work". I don't remember such a clause, then yes, it's fair to rely that they work. If the standard says that they may either work or not (which I don't remember either), then umqtt.robust clearly selects the positive reading. Again, if the standard is really like that, that fact can be explicitly documented.

And actually, it is: https://github.com/micropython/micropython-lib/blob/master/umqtt.robust/example_sub_robust.py#L13 . Which raises sad thoughts that our userbase isn't ready to read the source code, and even ourselves maintainers reduce to such approach. We really need to do something about that ;-).

pfalcon avatar Jun 02 '17 10:06 pfalcon

Regarding this patch: we definitely can add umqtt.dementia subclassing umqtt.robust, if there's interest to maintain that. The original idea is that users with specific needs will subclass and modify behavior of basic client provided. Again, looks like our userbase isn't ready for that, and we should do something about that.

pfalcon avatar Jun 02 '17 10:06 pfalcon

and who expect robust to be "robust"

Yeah, except everyone sees "robustness" differently.

and the whole aim (?) of umqtt.robust was

The whole aim of umqtt series was the same as of https://docs.python.org/3/library/http.server.html - provide an extensible framework for developing MQTT suitable for different usecase. In particular, umqtt.robust was an example of how to achieve one axis of "robustness", with its README discussing that it's impossible to have fixed-function "robust" functionality, definition of "robustness" depends on an application, so it's up to it to implement it. Putting everything possible into umqtt.robust isn't going to work, as in the end there will be a client which can do everything and leaving no room for real application. Unfortunately, while I almost can remember content of these paragraphs of README, somehow they aren't there.

pfalcon avatar Jun 02 '17 10:06 pfalcon

MQTT standard says there's a concept of "session", and it's recallable between client connections.

I'm confused by this remark... this patch is not about restoring a previous client session if the client goes down and comes back up, but rather if the broker/server goes down and loses all the sessions. If that happens then the clients all need to reconnect and resubscribe, because the server lost all state.

dpgeorge avatar Jun 02 '17 13:06 dpgeorge

I'm confused by this remark... if the broker/server goes down and loses all the sessions.

I'm confused either. It's broker's/server's job to keep sessions persistent, that's default mode of MQTT operation (client has to explicitly ask server to discard any saved state). If server underwent catastrophic failure and lost all its data, ... well, handling recovery from that state "transparently" was never the aim of umqtt.robust.

Another thing is that there're a lot of 3rd-party services which offers "MQTT which isn't real MQTT", and I understand people's desire not to get deep into all that MQTT stuff really, but still work somehow they like. I even understand the desire to provide help to such people (though the idea always has been that community will take care of fringe cases). But doing that at the expense of clarity and efficiency of real MQTT doesn't seem right. So, let's consider making that a separate module (if you consider that "umqtt.robust" name should be relinquished, that can be discussed too, though again, we should think whether we provide baseline MQTT client, or not-so-MQTT client).

pfalcon avatar Jun 02 '17 16:06 pfalcon

I'd like to chime in on this, since I am just a lowly user (not developer) of micropython-lib's umqtt:

well, handling recovery from that state "transparently" was never the aim of umqtt.robust.

Perhaps it should be. I, and others, have to manually make umqtt.robust actually be robust by having it resubscribe if the mqtt broker bounces for some reason. This actually happens fairly frequently (couple of times per week) for me since I reboot my broker after installing updates, etc.

umqtt.robust is NOT robust in this common scenario, and when these devices are used for home automation purposes then it requires us to program in a way to reset the device when the broker goes down, or manually implement what @dpgeorge has created here. It would be nice to have a somewhat 'official' way to handle this (i.e. this patch in this PR). I don't care what it's called, whether it's umqtt.robust() or umqtt.please_dont_die_for_this_silly_reason() :smile:

craftyguy avatar Jun 03 '17 18:06 craftyguy

From reading the standard these are the relevant parts that I could find relating to session storage (section 4.1):

  • The Client and Server MUST store Session state for the entire duration of the Session [MQTT-4.1.0-1]
  • A Session MUST last at least as long it has an active Network Connection [MQTT-4.1.0-2]

So there doesn't seem to be any requirement for the server (or client, which does need to store some state as part of the session) to store session beyond the network connection.

Furthermore, there is a specific bit in the CONNACK packet (SessionPresent, LSB of third byte of the packet) which is used by the server to tell the client if it restored any session state. If the client connects with CleanSession=1 then SessionPresent will be 0. If the client connects with CleanSession=0 then SessionPresent may be 0 or 1, and it's 1 only if the server held and restored the state.

At the very least, umqtt.robust should fail to reconnect if SessionPresent comes back as 0, otherwise the client is not in a consistent state because it's assuming that the server has restored all its subscriptions.

But a more useful and robust action is for umqtt.robust to resubscribe to all topics if SessionPresent comes back as 0.

dpgeorge avatar Jun 06 '17 03:06 dpgeorge

I updated this PR to only resubscribe if the SessionPresent comes back as false upon reconnect.

One option to make this addition use less RAM is to have a new method subscribe_all(topics) which takes a list/tuple of pairs of topic/QoS and stores that internally in the MQTTClient object. Then this list/tuple is reused each time to do the resubscriptions. If it's a tuple then there's a chance it's fully stored in ROM, and is anyway the single instance of the list of subscriptions, held by both the app and the MQTTClient object; there is no duplication anywhere of the topics.

dpgeorge avatar Jun 07 '17 01:06 dpgeorge

So there doesn't seem to be any requirement for the server (or client, which does need to store some state as part of the session) to store session beyond the network connection.

That's the point - it doesn't say that sessions are useless and don't work across connections. And current umqtt.robust relies on that. That can be documented.

pfalcon avatar Jun 07 '17 07:06 pfalcon

I updated this PR to only resubscribe if the SessionPresent comes back as false upon reconnect.

What if there's completely broken broker which doesn't support connection with clean_session=False?

One option to make this addition use less RAM is to have a new method subscribe_all(topics) which takes a list/tuple of pairs of topic/QoS

MQTT also support supports multiple topics in one subscribe message. Shouldn't it be implemented then?

stores that internally in the MQTTClient object.

Why there's need to store anything at all? Why not provide e.g. after_connect() method which should be subclasses and overriden by user?

There're so many choice - why a particular set of them is selected? It's possible to make umqtt.robust different (more bloated), but would it be significantly "better" or "more robust" is questionable. The current umqtt.robust is written with the criteria of minimal code size, relying on MQTT server features to back it. Why do we need to dilute that approach instead of implementing an alternative?

pfalcon avatar Jun 07 '17 07:06 pfalcon

This actually happens fairly frequently (couple of times per week) for me since I reboot my broker after installing updates, etc.

So, please go to creators of your broker and ask them to stop breaking your system on reboots and upgrades. They of course will do it at once, problem solved.

pfalcon avatar Jun 07 '17 07:06 pfalcon

It's not 'breaking the system, it's rebooting the system for things like kernel upgrades. But you are trying to bring up a red herring, apparently I am not alone in asking for this ability and my example is just one example of why the current mqtt.robust is not robust. I am currently fixing mqtt.robust in a manner similar to this patch, and it works quite well, it's very robust.

There's no need to get all defensive about this... it would be nice to provide this 'more robustness' for all users of this library without making them figure it out on their own. That's all I'm saying.

On June 7, 2017 12:19:40 AM PDT, Paul Sokolovsky [email protected] wrote:

This actually happens fairly frequently (couple of times per week) for me since I reboot my broker after installing updates, etc.

So, please go to creators of your broker and ask them to stop breaking your system on reboots and upgrades. They of course will do it at once, problem solved.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/micropython/micropython-lib/pull/186#issuecomment-306709079

craftyguy avatar Jun 07 '17 14:06 craftyguy

That's all I'm saying.

Yes, you're saying that, but you don't listen to what others saying. It's impossible to make something "robust" for everyone. For example, disk on on your server mail fail and you may come saying it's umqtt.robust's fault that it doesn't work after that. In that regard, saying "my server doesn't store my data, and it's umqtt.robust's fault" isn't much better. The requirements for umqtt.robust are described here: https://github.com/micropython/micropython/issues/2055 . In due time (the more such "discussions", the later), it will be moved to umqtt.robust (or whatever it will be called by then) documentation.

for all users of this library without making them figure it out on their own.

Then start putting your actions where your words are - release your "super-robust" module for all users and support it. Not keen to do that? Well, then you'll need to "figure it out" (it's information technologies after all) and wait until someone else will resolve it for you. The latter is coming, but may take long waits (people who resolve such things usually have hundreds if not thousands things to resolve).

pfalcon avatar Jun 07 '17 15:06 pfalcon

(edited to soften my tone a bit, my apologies)

The patch that @dpgeorge has here helps a lot with robustness, but I understand if it's not the "solve all the problems" solution you might be looking for. In any case, I'm fine, I have a solution that works for me. I'm thinking about all the future users of micropython who will have to struggle through the trial/error of figuring out how to make mqtt.robust more robust in the case of brokers that can't be relied on 100%. In case that the broker reboots once, even once a year, with the current mqtt.robust implementation you would have to get physical access to the micropython device to reset it. That sucks (no pun intended) when the device is buried in my vacuum cleaner for controlling it remotely (true story). I don't want to take that thing apart every time my broker goes down for whatever reason. 😄

Anyways, I'm done providing my $0.02 here. I really do appreciate the hard work you and others have done to make micropython, and micropython-libs, a great platform for tinkering on the esp8266 and other devices. In the end, it's ultimately your project, you are in control of its destiny, user feedback be damned 😛

craftyguy avatar Jun 07 '17 15:06 craftyguy

Uh........there's already a patch (right here) to make it much more robust.

You're again trying to miss 2 important points communicated above:

  1. There's nothing to patch - if you use it as intended ("as intended" part can be more easily to be discovered, as admitted). And if you don't use it as intended, dare I say there's no need for patches still, because patches along the line of using microscope as a hammer are definitely creative, but better to keep patches for a microscope and for a hammer separate.
  2. There's no single definition of "robustness", e.g. what's more robust for you is less robust for somebody else.

Just FYI, that's not a great way to attract development help and funding for your projects, but I digress.

I'm sorry that you feel this way, but you're trying to argue with the module specification and your MQTT server config, and those are hard things, not very keen on compromises. Otherwise, don't hesitate to come to your neighbour, ask them (politely!) to paint their house pink, and when they say "no", tell them that they're bad neighbours and don't deserve good treatment. That works, yeah.

pfalcon avatar Jun 07 '17 16:06 pfalcon

Ok, so 808e0bba0c7cb932c9feaeab45f223ae7955bd79 adds "missing documentation" for umqtt.robust. Love it or not, that's what it is, and it's as perfect as MQTT (real MQTT, not sandbox variety of MQTT) allows that. That doesn't mean we can't have something else too - we can, as commented above, even the names are negotiable.

pfalcon avatar Jun 07 '17 17:06 pfalcon

That commit definitely helps clarify what 'robustness' this implementation provides, and would have been helpful for me a few weeks ago :)

On June 7, 2017 10:33:12 AM PDT, Paul Sokolovsky [email protected] wrote:

Ok, so 808e0bba0c7cb932c9feaeab45f223ae7955bd79 adds "missing documentation" for umqtt.robust. Love it or not, that's what it is, and it's as perfect as MQTT (real MQTT, not sandbox variety of MQTT) allows that. That doesn't mean we can't have something else too - we can, as commented above, even the names are negotiable.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/micropython/micropython-lib/pull/186#issuecomment-306868309

craftyguy avatar Jun 07 '17 17:06 craftyguy

Unfortunately, there's always a backlog of documentation work, starting with the main MicroPython docs (which are still well ahead of docs for the competing projects (MCU scripting languages), YMMV).

pfalcon avatar Jun 07 '17 17:06 pfalcon

@pfalcon why didn't you make a PR for that documentation change, like this one? It's a bit unfair that you go and push things that you think should happen, after the fact (ie after this PR was propossed). As I said, the MQTT specs don't require the broker to have persistence across network connections, so to be properly robust when reconnecting the client should resubscribe (and that's how other clients work).

There are also non-normative comments in the specs such as To ensure consistent state in the event of a failure, the Client should repeat its attempts to connect with CleanSession set to 1, until it connects successfully., which suggests that one should resubscribe.

In any event, the current umqtt.robust implementation is broken in that it should check if the return value of the reconnect: if the server did not retain the session then umqtt.robust should exit with a failure. Then users can at least see what the problem is, and look for a more robust solution.

dpgeorge avatar Jun 13 '17 07:06 dpgeorge

@pfalcon why didn't you make a PR for that documentation change

Because the documentation patch describes the requirements which were set for this modules, how it has been always implemented, and captures internal project documentation available for a year (#2055), which, as was explained was intended to be converted to user documentation (README) long ago.

As I said, the MQTT specs don't require the broker to have persistence across network connections

And as I said, there're no requirements for persistence NOT to work, the module from the beginning was developed with persistence in mind, was validated to work like that, and requirements were captured in project documentation, which now has been converted to user documentation.

and that's how other clients work

And square pegs are the best things for round holes.

In any event, the current umqtt.robust implementation is broken in that it should check if the return value of the reconnect: if the server did not retain the session then umqtt.robust should exit with a failure.

It's that strange case when all things need to be repeated 2-3 times. Did you see the standard example for umqtt.robust: https://github.com/micropython/micropython-lib/blob/master/umqtt.robust/example_sub_robust.py#L34 ? It's not this module's task to "exit with a failure". This module communicates status of the server session back to application, and it decides what to do: a) re-subscribe; b) send email that the server is hosed; c) turn on a siren to convey the fact in more audible manner; d) control leg and arm joints of a robot inside which the application runs, to replace faulty hdd on the server; e) anything else.

It's a bit unfair

We definitely discuss anything else except a productive solution. Let's sum up and repeat parts of it again:

Question # 0 to decide: Whether to add another MQTT client of robust variety. Because the current already works per its specification, so if there's another specification needed, in our constraints, it means a different client. I'd personally rather give time to a client implementation for uasyncio, or wrapping one of C clients (lwIP 2.0 has its own implementation, whoa!), but if you feel like maintaining yet another one with limited functionality, why not?

Then, choices:

  1. Add new modules under new name.
  2. Rename existing umqtt.robust, add new one under that name.
  3. (Re)name both to retain, but disambiguate "robust" part, i.e. using a 3-component package name.

pfalcon avatar Jun 13 '17 11:06 pfalcon

In any event, the current umqtt.robust implementation is broken in that it should check if the return value of the reconnect:

Ah, right, you mean reconnect(). Yep, we can "fix" that. Which only shows there're many "corner cases" on the way to "robustness". How your solution would handle thousands of subscribed topics for example? Mine does perfectly, because subscriptions don't even have to be part of an embedded application, as comments in example_sub_robust.py explain.

pfalcon avatar Jun 13 '17 12:06 pfalcon

umqtt.robust works fine as it should. Tested with mosquitto mqtt broker (1.3.4) on Debian/Jessie (default configuration on Debian is persistence true).

  1. start example_sub_robust.py on ESP8266 or unix port (you may want to change the server address)
  2. see how umqtt connects to the broker
  3. publish something to foo_topic, e.g.: mosquitto_pub -h localhost -t 'foo_topic' -m 'bar'
  4. see how umqtt receives the message
  5. restart/reboot the mqtt broker
  6. see how umqtt.robust reconnects
  7. publish again something to foo_topic
  8. see how umqtt receives the message
  9. be happy!

-1: Please do not keep the subscribed topics in a list. Memory matters on EPS8266. In some applications I use format() or slicing when subscribing. This PR may lead to out-of-memory problems.

But I see, on some broker/configuration a resubscribe is necessary.

@pfalcon wrote:

Why there's need to store anything at all? Why not provide e.g. after_connect() method which should be subclasses and overriden by user?

+1 for a after_connect() function! But is subclassing necessary? Can't we just implement it like the callback function.

puuu avatar Jun 29 '17 11:06 puuu

+1 for after_connect() or as Paho calls it: on_connect()

In Paho this is the recommended way to subscribe, on the first connect as well as on every reconnect.

solarjoe avatar Nov 13 '20 09:11 solarjoe

@pfalcon wrote:

Why not provide e.g. after_connect() method which should be subclasses and overriden by user?

-1 for after_connect() or on_connect() - I also prefer to keep the underlying code size at a minimum. Anyone who can override an on_connect() method is already subclassing umqtt.robust, so, provided #195 is adopted, they can re-implement connect() to include / call resubscribe() or whatever else they need to.

ian-llewellyn avatar Apr 26 '23 19:04 ian-llewellyn