aiomqtt icon indicating copy to clipboard operation
aiomqtt copied to clipboard

Does not support disconnect/connect, which means it doesn't support QOS > 0

Open 0x0a11c0de opened this issue 4 years ago • 4 comments

This library does not work with the following (pseudocode):

client = Client(..., client_id='1234', clean_session=False)
client.connect()
... do stuff ...
client.disconnect() // or we got disconnected for any reason
client.connect()

Any iteration of subscribed topics will fail after the disconnect because the disconnected future in the client remains set. Because of that, the client cannot be reused after a disconnect. The client is where QOS > 0 is implemented (it has the queue of published messages) so that means this library cannot support publishing with QOS > 0.

0x0a11c0de avatar Feb 27 '21 15:02 0x0a11c0de

Hi and thanks for raising this issue. Let me have a look. :)

Because of that, the client cannot be reused after a disconnect.

This is currently by design. We recommend that you use the async context manager interface instead of raw connect and disconnect calls. This only allows a single use of each client. I argue that it's not worth the code complexity to support arbitrary connect and disconnect calls. For all use cases that I can think of, a single connect followed by a single disconnect is all that you need. Just create a new client when you're done with the old one (like in the reconnect logic of the advanced example). Yes, there is some overhead because we create a new client instead of reusing the old one. I argue that this overhead is negligible.

If you want to use clean_session=False, you simply set that argument on the new client as well. The server will happily send you any data from the old session even though it's a new client object. All that matters is that the client ID is the same. :+1:

The client is where QOS > 0 is implemented (it has the queue of published messages) so that means this library cannot support publishing with QOS > 0.

I get your argument and it would indeed apply to raw paho-mqtt. :+1: Luckily, this isn't a problem in asyncio-mqtt. This is because every publish call with qos>0 awaits the corresponding PUBACK message. In practice, this means that the client message queue is always empty after control returns from await client.publish(...). Thus you'll never "lose" a message this way. Does it make sense?


Note that asyncio-mqtt inherits the known issues of paho-mqtt. Maybe this quote is of interest:

When clean_session is False, the session is only stored in memory not persisted. This means that when client is restarted (not just reconnected, the object is recreated usually because the program was restarted) the session is lost. This result in possible message lost.

Later, there is this interesting point (emphasis added):

  • QoS 1 and QoS 2 messages which have been sent to the Server, but have not been completely acknowledged.

    This means that message passed to publish() may be lost. This could be mitigated by taking care that all message passed to publish() has a corresponding on_publish() call.

This is exactly what asyncio-mqtt does to mitigate potential message loss. :+1:

frederikaalund avatar Feb 28 '21 11:02 frederikaalund

My example was terrible, sorry about that. The QOS problem is on the client side, not the server side. The problem doesn't exist in paho-mqtt. In async-mqtt, if there is a communication error, and I attempt to re-establish a connection by calling disconnect, then connect, this does not work. As you noted, this is by design. Unfortunately, this design choice means QOS > 0 cannot work. If I have to create a whole new client during comm problems, all of the messages that are queued to go out at QOS > 0 are lost.

0x0a11c0de avatar Mar 01 '21 22:03 0x0a11c0de

My example was terrible, sorry about that.

It's completely fine. I just had to work it a bit in my head first. :))

The QOS problem is on the client side, not the server side.

Ah, that's your concern. :+1: Nice to have it specified.

In async-mqtt, if there is a communication error, and I attempt to re-establish a connection by calling disconnect, then connect, this does not work.

I'm kind of second-guessing my decision to expose connect and disconnect at all. Maybe we should make these methods private. This way, we force the user to use async context managers.

Unfortunately, this design choice means QOS > 0 cannot work. If I have to create a whole new client during comm problems, all of the messages that are queued to go out at QOS > 0 are lost.

My point is that there won't be any queued messages (after publish returns). As soon as all pending publish calls finish, the queue size is zero.

It does put the semantics of publish into question. I think that's a good issue to raise on it's own (as you already did). :+1: Let's continue that discussion in #48.

frederikaalund avatar Mar 02 '21 11:03 frederikaalund

@nbraun-wolf Thanks for chipping in, but this issue (#46) is about connect/disconnect in combination with the QoS level. Your issue is about reusing the Client instance (i.e., reconnecting). Create a separate issue for that. :)

frederikaalund avatar Oct 24 '21 11:10 frederikaalund

This is an interesting discussion 👍 I believe that this issue was solved by #216, which made the client context manager reusable (but not reentrant). The chapter on reconnection in the documentation shows how we can reuse a single client instance multiple times.

How to publish without confirmation is still open, but you already addressed that in a separate issue (#48) 👍 If there's anything left unclear or unsolved, please reopen this issue!

empicano avatar Jan 27 '24 02:01 empicano