aedes
aedes copied to clipboard
fix: send LWT on broker restart
This will require companion fixes on persistences, see aedes-persistence-redis #104 for example.
The core issue is that clients state is not stored in any way, so these wills get orphaned in the db whenever the server restarts.
Pull Request Test Coverage Report for Build 2682867428
- -5 of 7 (28.57%) changed or added relevant lines in 1 file are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage decreased (-0.7%) to 99.128%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
aedes.js | 2 | 7 | 28.57% |
<!-- | Total: | 2 | 7 |
Totals | |
---|---|
Change from base Build 2337078913: | -0.7% |
Covered Lines: | 810 |
Relevant Lines: | 815 |
💛 - Coveralls
Pull Request Test Coverage Report for Build 2724226001
- 6 of 7 (85.71%) changed or added relevant lines in 1 file are covered.
- No unchanged relevant lines lost coverage.
- Overall coverage decreased (-0.3%) to 99.564%
Changes Missing Coverage | Covered Lines | Changed/Added Lines | % |
---|---|---|---|
aedes.js | 6 | 7 | 85.71% |
<!-- | Total: | 6 | 7 |
Totals | |
---|---|
Change from base Build 2337078913: | -0.3% |
Covered Lines: | 814 |
Relevant Lines: | 815 |
💛 - Coveralls
I'm not sure if this will meet the standard, perhaps it is somewhat of a kludge, but see the new "delivers old will in case of a restart on the same brokerId" test, I don't know right off how to get this to pass without using the sendWillsOnStartup flag. Some other way to address this issue would be ok, but I'm at my limit of time on this.
Other notes, am I missing something, or would the present method of checkAndPublish not bypass packet authorization?
@mcollina What do you think about this? I cannot find anything in specs about how to handle broker restarts properly, but IMO it makes sense to send will on startup, otherwise we would have lot of orphans in persistence (the alternative would be to just delete them)
Other notes, am I missing something, or would the present method of checkAndPublish not bypass packet authorization?
@friesendrywall correct, checkAndPublish should also call authorize publish, that's another error, could you add that please? Also that one should be covered with a test
About the checkAndPublish authorize, what is the correct way? There is no complete client object to pass, it would only be able to pass the clientId from the packet. Should this rather be authorized elsewhere? Maybe this should be a separate pull request.
You right, without client id it wouldn't work.. IMO the best solution would be to autohrize will on connect and if auth fails just remove it and don't store it at all
@robertsLando, @friesendrywall
- from MQTT 3.11 spec Section 3.1.2.5
Situations in which the Will Message is published include, but are not limited to: An I/O error or network failure detected by the Server. The Client fails to communicate within the Keep Alive time. The Client closes the Network Connection without first sending a DISCONNECT Packet. The Server closes the Network Connection because of a protocol error.
The Server SHOULD publish Will Messages promptly. In the case of a Server shutdown or failure the server MAY defer publication of Will Messages until a subsequent restart. If this happens there might be a delay between the time the server experienced failure and a Will Message being published.
- From HiveMQ doc,
The Last Will and Testament feature provides a way for clients to respond to ungraceful disconnects in an appropriate way.
- From a local vernemq dev env, restarting a server will not trigger LWT. Instead a client keeps sending CONNECT to resume its connection in best effort within its keep alive period.
- If the persistence is a memory storage layer, server can't send LWT if
sendWillsOnStartup
is set to TRUE. It will make confusion in different outcomes in a persistence point of view.
IMO, sending LWT is on a client state perspective while sendWillsOnStartup
is in server perspective that violates what ungraceful disconnection stated in above. What I suggested is to restore all client previous state after restart, and let aedes sends LWT itself if a client does not respond within a keep alive period.
What I suggested is to restore all client previous state after restart, and let aedes sends LWT itself if a client does not respond within a keep alive period.
What do you mean with this in practise? By reading docs they just say that in case of server shutdown or failure the LWT should be sent on restart so IMO the sendWillsOnStartup
should be considered true by default, the behaviour introduced by it it's spec compliance
@robertsLando will the LWT be suppressed in cluster or HA mode when sendWillsOnStartup
is TRUE?
will the LWT be suppressed in cluster or HA mode when sendWillsOnStartup is TRUE
Yeah you right, in case of a cluster this would not work, so what you mean with 'restore all client previous state after restart,' ? How should we do that? Recreate all clients for each LWT we found associated with our broker id?
I'm a little non hopeful anyone will like this commit, its close to the point of over and out and perhaps just patching this by publishing to a wildcard to override this behaviour.
I don't really have a sufficient in depth understanding of the persistence layer to modify it at this point, but it really needs it to correctly make this work because it needs to have most of the original client data to manage this smoothly. For example, it seems like it really should track the keepAlive interval and honor that. The simplest option that would require no persistence layer change would be to add extra data to the packet, but that probably has unintended consequences.
Also, I created a new bug report for the non auth, it really is a different issue, although it too will probably require a persistence layer addition.
Ideally, the persistence layer would track the last client keepAlive, but that would probably affect the overall performance.
Aedes checks and sends LWT periodically. If you restart the broker with same broker id, the logic of LWT delivery resumes in its first heartbeat for non-memory persistence. https://github.com/moscajs/aedes/blob/main/aedes.js#L99-L111 @robertsLando @friesendrywall
@gnought Lines 119 to 121 filter any active brokers so previous wills don’t ever go out because they don’t “needsPublishing”. This all works ok with auto generated id’s.
Couldn't we fix this by simply using a flag that will force send existing wills on startup like @friesendrywall did in first PR? If we wait the broker heartbeat interval it could be the client has been already reconnected to broker
IMO a server should not send a client LWT if a server restarts within a client keepalive period.
@gnought Yeah but what if the keep alive period is already elapsed and, in the meanwhile, client connects? That Will would never be published. I think it would be more acceptable to publish it before keep alive interval instead of not publish it at all, also a server failure should be considered as an unexpected disconnection so it should be always notified
@mcollina If you have other suggestions them would be much appreciated :)
What I understand the MAY defer is
the server MAY defer publication of Will Messages until a subsequent restart
- if a server restarts within a client keep alive period, no LWT should be sent. This could be regarded as a network hiccups in client perspective.
- If the restart is over client keep alive period, LWT should be sent.
@robertsLando yes, if a broker id is same in each subsequent restarts:
- In a cluster mode, the restarted broker
brokers
array will be re-populated during the heartbeat. Aedes will send old LWT. - In a non-cluster mode with non-memory persistence, Aedes will also send old LWT after restart.
@gnought Ok I agree with you, makes sense.
Just to try I did a test with mosquitto and this is what happped (version 2.0.14):
- If mosquitto is closed gracefully it immediatly sends LWT messages before closing, this could be correct
- If mosquitto is force closed (
sudo kill -9 $(pidof mosquitto)
) even if the broker remains offline for a period greather then the keepalive period it never sends LWT (that's definetly wrong)
Do you have a external persistence session storage layer in mosquito server? @robertsLando
Do you have a external persistence session storage layer in mosquito server? @robertsLando
Yes, I have enabled persistence in mosquitto.conf. Also the db file exists so it means he is using it
Hmm, this is same as the current Aedes behaviour. The obsolete LWT is left in the persistent storage of mosquito.
Hmm, this is same as the current Aedes behaviour
Aedes is not sending LWT on close BTW
I'm closing this pull request because I don't have the resources to complete, also I'm not sure how to bring this together.