paho.mqtt.c icon indicating copy to clipboard operation
paho.mqtt.c copied to clipboard

QoS 1 and QoS 2 ack control and persistence

Open rthrippleton opened this issue 6 years ago • 3 comments

While trying to figure out something else through code inspection, I came across this in MQTTProtocol_handlePublishes:

if (publish->header.bits.qos == 1) {
    rc = MQTTPacket_send_puback(publish->msgId, &client->net, client->clientID);
    Protocol_processPublication(publish, client);
}

I think this means if there is some kind of application failure/hardware failure/power failure, there's a chance that the incoming message gets acknowledged before the application that's using Paho returns from processing the incoming message.

The application/hardware is then restarted, and the broker will not resend the message that got missed, because it already saw a (premature) acknowledgement for it. It's a very small window for message loss, but it could happen.

rthrippleton avatar Aug 20 '18 14:08 rthrippleton

Follow-up - I've done a bit more reading of the spec, other implementations, and talked to a colleague who's more clued up about MQTT than I am. I think this might just be a misunderstanding on my part - MQTT QoS is about coping with network unreliability, rather than end-to-end reliable messaging, right? My knowledge is more around 'enterprise' messaging, hence the mistake.

If this is the case, then I suppose we should reshape this issue as a less important feature request for application-controlled acks :-)

rthrippleton avatar Aug 21 '18 10:08 rthrippleton

For some reason, you didn't include the helpful comments in the code snippet:

else if (publish->header.bits.qos == 1)
{
  /* send puback before processing the publications because a lot of return publications could fill up the socket buffer */
  rc = MQTTPacket_send_puback(publish->msgId, &client->net, client->clientID);
  /* if we get a socket error from sending the puback, should we ignore the publication? */
  Protocol_processPublication(publish, client);
}

These were from the days when this code was in a broker, RSMB. And because of that, I think they don't apply any more, and the order of the calls could/should be reversed.

It's true that MQTT QoS is primarily about network connection breaks, but it doesn't mean that end-to-end reliability isn't relevant. I drew up a table a good while back: http://modelbasedtesting.co.uk/2013/11/24/mqtt-qos-and-persistence/

As it happens, on the second point, I was asked about this in relation to QoS 2. For full integration of say a database or transaction processor, we should have two callbacks - one for receipt of the publish and another for the pubrel, so that a full end-to-end two phase commit can be enabled. As I hadn't raised that issue yet, I'll change the title of this one to encompass these thoughts.

icraggs avatar Aug 21 '18 11:08 icraggs

Hi Ian, Sorry for the excessively compact code; I was trying to be tidy and brief, but I suppose that's less helpful in this case.

To check we're on the same page re: application-controlled acks, for QoS 1 is the idea that an application would deliberately says "ack message with ", rather than assuming that a callback return is an acknowledgement? The former will be better for throughput when the 'commit' operation of the application is quite high-latency, so there's a large rolling window of unacknowledged messages e.g. forwarding on a derived message to another system.

Thanks.

rthrippleton avatar Aug 21 '18 12:08 rthrippleton