aedes icon indicating copy to clipboard operation
aedes copied to clipboard

[bug] `pubrel` delivery re-try causes `no such packet` error

Open tomekt-bolt opened this issue 3 years ago • 8 comments

System Information

  • Aedes: 0.42.5
  • NodeJS: v14.15.4
  • OS: MacOS 10.15.7
  • Arch: x86_64

Describe the bug When receiving pubrel packet for a message that's been already fully processed from the broker's point of view, the broker logs no such packet error and disconnects. The client. If the client didn't receive the initial pubcomp and wants to resend the pubrel upon reconnection, this will lead to a re-connection loop, where the client would connect, send pubrel, get disconnected, re-connect, send pubrel, get disconnected and so on.

To Reproduce Run the publish QoS 2, pubrel delivery retry test I've added here.

Expected behavior The broker should neither emit any error nor disconnect the client when receiving pubrel packet for already pubcomped message. It should send the pubcomp once more instead.

Additional context According to the MQTT spec the client

MUST re-send any unacknowledged PUBLISH Packets (where QoS > 0) and PUBREL Packets using their original Packet Identifiers

so conforming to this should not cause any errors.

tomekt-bolt avatar May 06 '21 10:05 tomekt-bolt

@tomekt-bolt I dunno when i can look at this, if you have time to submit a PR I can review and merge changes once done

robertsLando avatar May 06 '21 11:05 robertsLando

Thanks @robertsLando. I don't have an ETA from my side either, unfortunately. I'll let you know if I start working on this to avoid duplicate effort.

Can you confirm though if my understanding of the issue is correct and that the test I've added should pass? Or is there some misconception on my side?

tomekt-bolt avatar May 06 '21 11:05 tomekt-bolt

I'm not sure and I cannot find a detailed response to this in specs, what you said makes sense because in case client disconnects before getting the PUBCOMP packet on reconnect it will resend a PUBREL and the broker should then send the PUBCOMP again, but your test doesn't reproduce this as in the test the PUBCOMP is successfully delivered in publish method.

In fact if you check the code the pubrel packet is deleted immediatly after the write completes correctly. The question here is: if we don't delete it there, where and when should we delete it? When receiving a publish packet with the same packet identifier? But this could bring to a lot of useless memory used

@mcollina thoughts on this?

robertsLando avatar May 06 '21 12:05 robertsLando

but your test doesn't reproduce this as in the test the PUBCOMP is successfully delivered in publish method.

I don't think it's an issue: I think that the broker cannot tell whether the pubcomp has been successfully delivered or not and that's why it should be always prepared to receive pubrel.

if we don't delete it there, where and when should we delete it?

I think it's OK to delete any state related to given packet identifier after receiving pubrel. This should not prevent replying with pubcomp to any subsequent pubrel though. So what I'm saying is that whenever the broker receives a pubrel message for packet it doesn't know, it should treat it as a pubrel re-delivery and just politely reply with pubcomp 😉.

tomekt-bolt avatar May 06 '21 13:05 tomekt-bolt

Oh ok so you mean that in general he should always reply to a pubrel with a pubcomp even if there is no packet matching in store?

robertsLando avatar May 06 '21 13:05 robertsLando

If so the error could be quite easy to fix, we could completely ignore all errors or just ignore packets not found errors here: https://github.com/moscajs/aedes/blob/main/lib/handlers/pubrel.js#L32

robertsLando avatar May 06 '21 13:05 robertsLando

Oh ok so you mean that in general he should always reply to a pubrel with a pubcomp even if there is no packet matching in store?

I think replying with pubcomp to pubrel for message we don't have in the persistence is the only viable option, unless I'm missing something. Not sure though if it should be the same if we have the message in the persistence, but we haven't pubreced it yet - in such situation the pubrel would be unexpected. But maybe there's no need to bother with this.

the error could be quite easy to fix, we could completely ignore all errors or just ignore packets not found errors

I wouldn't like to ignore all the errors. I guess there could be some other errors like not having the active connection to the underlying redis (or whatever persistence) and in such cases the error should be properly propagated, not suppressed. Ignoring the no such packet error specifically could work here, but I'm a bit afraid on relying on this hardcoded string. Are all the persistence implementations guaranteed to use the very same string?

tomekt-bolt avatar May 06 '21 13:05 tomekt-bolt

Redis: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L524 Mongodb: https://github.com/moscajs/aedes-persistence-mongodb/blob/master/persistence.js#L613

We should also patch persistence to return something that allow to distinguish errors correctly.

I also noticed that not just the error message changes but also mongo returns null and client as params where redis returns only the error. I think the steps here are to firstly add a test to abstract test of aedes persistence that checks the error returned when there is no packet found is the same across all persistences then patch all them

robertsLando avatar May 06 '21 14:05 robertsLando