aedes
aedes copied to clipboard
[bug] `pubrel` delivery re-try causes `no such packet` error
System Information
- Aedes: 0.42.5
- NodeJS: v14.15.4
- OS: MacOS 10.15.7
- Arch: x86_64
Describe the bug
When receiving pubrel
packet for a message that's been already fully processed from the broker's point of view, the broker logs no such packet
error and disconnects. The client. If the client didn't receive the initial pubcomp
and wants to resend the pubrel
upon reconnection, this will lead to a re-connection loop, where the client would connect, send pubrel
, get disconnected, re-connect, send pubrel
, get disconnected and so on.
To Reproduce
Run the publish QoS 2, pubrel delivery retry
test I've added here.
Expected behavior
The broker should neither emit any error nor disconnect the client when receiving pubrel
packet for already pubcomp
ed message. It should send the pubcomp
once more instead.
Additional context According to the MQTT spec the client
MUST re-send any unacknowledged PUBLISH Packets (where QoS > 0) and PUBREL Packets using their original Packet Identifiers
so conforming to this should not cause any errors.
@tomekt-bolt I dunno when i can look at this, if you have time to submit a PR I can review and merge changes once done
Thanks @robertsLando. I don't have an ETA from my side either, unfortunately. I'll let you know if I start working on this to avoid duplicate effort.
Can you confirm though if my understanding of the issue is correct and that the test I've added should pass? Or is there some misconception on my side?
I'm not sure and I cannot find a detailed response to this in specs, what you said makes sense because in case client disconnects before getting the PUBCOMP packet on reconnect it will resend a PUBREL and the broker should then send the PUBCOMP again, but your test doesn't reproduce this as in the test the PUBCOMP is successfully delivered in publish
method.
In fact if you check the code the pubrel packet is deleted immediatly after the write completes correctly. The question here is: if we don't delete it there, where and when should we delete it? When receiving a publish packet with the same packet identifier? But this could bring to a lot of useless memory used
@mcollina thoughts on this?
but your test doesn't reproduce this as in the test the PUBCOMP is successfully delivered in publish method.
I don't think it's an issue: I think that the broker cannot tell whether the pubcomp
has been successfully delivered or not and that's why it should be always prepared to receive pubrel
.
if we don't delete it there, where and when should we delete it?
I think it's OK to delete any state related to given packet identifier after receiving pubrel
. This should not prevent replying with pubcomp
to any subsequent pubrel
though. So what I'm saying is that whenever the broker receives a pubrel
message for packet it doesn't know, it should treat it as a pubrel
re-delivery and just politely reply with pubcomp
😉.
Oh ok so you mean that in general he should always reply to a pubrel with a pubcomp even if there is no packet matching in store?
If so the error could be quite easy to fix, we could completely ignore all errors or just ignore packets not found errors here: https://github.com/moscajs/aedes/blob/main/lib/handlers/pubrel.js#L32
Oh ok so you mean that in general he should always reply to a pubrel with a pubcomp even if there is no packet matching in store?
I think replying with pubcomp
to pubrel
for message we don't have in the persistence is the only viable option, unless I'm missing something. Not sure though if it should be the same if we have the message in the persistence, but we haven't pubrec
ed it yet - in such situation the pubrel
would be unexpected. But maybe there's no need to bother with this.
the error could be quite easy to fix, we could completely ignore all errors or just ignore packets not found errors
I wouldn't like to ignore all the errors. I guess there could be some other errors like not having the active connection to the underlying redis (or whatever persistence) and in such cases the error should be properly propagated, not suppressed. Ignoring the no such packet
error specifically could work here, but I'm a bit afraid on relying on this hardcoded string. Are all the persistence implementations guaranteed to use the very same string?
Redis: https://github.com/moscajs/aedes-persistence-redis/blob/master/persistence.js#L524 Mongodb: https://github.com/moscajs/aedes-persistence-mongodb/blob/master/persistence.js#L613
We should also patch persistence to return something that allow to distinguish errors correctly.
I also noticed that not just the error message changes but also mongo returns null and client as params where redis returns only the error. I think the steps here are to firstly add a test to abstract test of aedes persistence that checks the error returned when there is no packet found is the same across all persistences then patch all them