rtl_433 icon indicating copy to clipboard operation
rtl_433 copied to clipboard

Add support for "Last will and testament" in MQTT connections.

Open vidarino opened this issue 4 years ago • 27 comments

vidarino avatar Nov 02 '20 10:11 vidarino

Blah, got some old crap in the commit history. Cleaning up and recreating PR.

vidarino avatar Nov 02 '20 10:11 vidarino

You can just update this PR. Keep your local branch, set it to your new changes, then force push.

zuckschwerdt avatar Nov 02 '20 10:11 zuckschwerdt

Btw, the PR looks clean. Just "squash" and force push if you want to get rid of the history -- we'll do that on merge anyway.

This idea is discussed in #1394 -- LWT isn't just one value but kind of a tri-state. And we likely can't support all use-cases.

zuckschwerdt avatar Nov 02 '20 10:11 zuckschwerdt

Thanks for the tips! Sorry for the noobiness. I really need to sit down and learn all the intricacies of GIT on of these days. 👍

Interesting discussion there, but I'm not sure what the conclusion was. ;) I know "barebones" LWT support is enough for my use - I just want to detect a stopped service, regardless of the reason. But I guess the real question is if it's generally useful enough to be merged? :)

vidarino avatar Nov 02 '20 11:11 vidarino

I guess we can serve an "lwt" topic, you can expect to see "Online" there if rtl_433 is running and connected. Any other value doesn't likely map to a known state (shutdown? crash? network problem? never started?) That means we don't want to offer a configurable message -- it's "Online" or something else. The topic should be given as lwt=foo and will default to base_topic/lwt.

zuckschwerdt avatar Nov 02 '20 11:11 zuckschwerdt

Jeeze, sorry about the commit spam. I'm having a brain fart streak today.

Two concerns about the current version:

  • The "online" and "offline" LWT messages are hardcoded at the top of output_mqtt.c
  • The free()-ing of ctx->opts.will_topic needs to be casted to a non-const, which feels a bit ugly.

vidarino avatar Nov 02 '20 12:11 vidarino

Git is build for that, add commits as you like. Squash if you want to present a cleanup -- but the "Files changed" tab already shows that summary.

  • the LWT messages can be given inline we won't treat those as variable and new ones ("crash"?) might be introduced.
  • const mostly documents "non-ownership" which doesn't apply here. Just copy the pattern from the other 3 topics.

zuckschwerdt avatar Nov 02 '20 12:11 zuckschwerdt

This PR needs forward porting after tls support, Otherwise I've bin using this for a while with the expected results so LGTM.

glance- avatar Feb 15 '21 07:02 glance-

is this checked in already?

fgonza2 avatar Apr 25 '22 15:04 fgonza2

No, there is no clear idea what "connection status" would mean. I.e. how to convey shutdown, crash, network problem, never started, ...

zuckschwerdt avatar Apr 25 '22 16:04 zuckschwerdt

lwt is a mechanism, and we're talking about defining semantics.

It seems obvious that it is useful to have an availability topic that is set to an "ON" value, and on clean exit or network disconnect is set to "OFF". That takes a publish on clean exit, and a lwt for disconnect. Yes, sometimes people want to know about crashes, but for home assistant and similar what you want to know is "Is the thing that reports to HA connected".

I don't think the topic should have "LWT" in it. That's blurring topic names and what a lwt is commanded to publish.

I realize for defaults we have to pick something, either matching what other publishers do or what receivers expect. I favor aligning to HA availability topics.

There are separate issues about the sdr being ok and individual sensors being ok. That's not unusual, and that leads to sensors also having a timeout so that if there isn't a write for a while they show unavailable as well. I don't think that issue should be blurred with the lwt. However, this points out that each rtl_433 instance should have a distinguished topic prefix so that the relationship between that instance's online topic and the individual device topics is clear.

As a reviewer, I'd like to see this rebased into clean commits and the conflicts resolved. I realize that isn't the rtl_433 norm, so I should be clear that this is a personal opinion, not a project one.

gdt avatar Jul 12 '22 19:07 gdt

Thanks. Agreed.

My experience so far is Tasmota: https://tasmota.github.io/docs/MQTT/#lwt-topic-last-will-and-testament -- it's what I currently use for that case (is the sender still connected "Online"). We should use some accepted known standard. Choosing HA defaults seems fine.

"By default, Home Assistant sends online and offline to homeassistant/status." https://www.home-assistant.io/docs/mqtt/birth_will/

Personally I find "Online" more fitting than "On", as there is no indication if something stopped/crashed/broke, just the info that the connection is there or not.

zuckschwerdt avatar Jul 12 '22 19:07 zuckschwerdt

"By default, Home Assistant sends online and offline to homeassistant/status." https://www.home-assistant.io/docs/mqtt/birth_will/

Personally I find "Online" more fitting than "On", as there is no indication if something stopped/crashed/broke, just the info that the connection is there or not.

I use "/sensor/foo/bar//online" as the topic to represent if the foo/bar sensor is connected, and ON and OFF to represent true and false. This is fundamentally a true/false variable with messy semantics, so I think about not what words mean but how to interoperate.

This is diffcult because homeassistant has multiple representations for true and false depending on context. I think it's buggy for HA to send online/offilne, because to me the point is to read that status in some other HA installation. My config has a comment "Align to binary sensor" and publishes ON and OFF both with retain=true. I have my nodemcu devices send ON/OFF to the /online topic, and use this as both a binary sensor in its own right (with an alert) and as an availability topic.

Overall, I'd say it's a design bug in HA to have ON/OFF for sensors and online/offline for availability topics. I guess that reality argues for using online/offline, and people who want binary sensors can override. Or fix the code so for both availabilty and binary_sensor ON, online, True, true, #t, t are true, and OFF, offline, False, false, #f, nil are false. Kind of kidding about Scheme and LISP.

gdt avatar Jul 12 '22 23:07 gdt

A good point to see the topic as semantic and the payload as syntax (boolean) only. And somewhat natural to want the availability state as the same binary sensor (boolean) syntax.

For reference: https://www.home-assistant.io/integrations/binary_sensor.mqtt/#availability

I can only guess that maybe the availability topic was intended to have more possible values? But that could be represented on another topic "exit_reason" -- if it did exit at all, availability for me mostly is Wifi hiccups.

zuckschwerdt avatar Jul 13 '22 07:07 zuckschwerdt

My guess is that the people who wrote the availability code in binary_sensor thought that availabilty should have different values because ON/OFF is data and online/offline is metadata, or something like that. I have never heard of availbility topics having multiple values. I can certainly see "disconnect reason" being useful, but I also see MQTT in this context as always-connected, with the end node reconnecting if it gets disconnected, so this is really "am I getting data" or "am I not getting data". Sometimes the program crashes, sometimes the computer crashes, sometimes it is power, sometimes it is network, and occasionally the admin stops it. To me knowing it's not working is just a poke to go diagnose/fix.

gdt avatar Jul 13 '22 10:07 gdt

I'm looking for a way to monitor the health of my rtl433 process -- occasionally it has issues connecting to MQTT, or even starting up if the usb device gets messed up. I was hoping to use the online/LWT messages, as per above, they are not yet implemented. For this kind of application the "status" in LWT is irrelevant, it would useful even if it always just said "Offline" without any attempt to describe the reason.

If we're really stuck on the semantics of the status, what do you guys think of adding support for some kind of a heartbeat message?

mag1024 avatar Aug 06 '22 23:08 mag1024

I don't think we're at an impasse over semantics. I feel like we've converged on adding support for a lwt to publish a value to a topic, and perhaps for "offline" or "OFF", probably "offline" as the value to use if the user doesn't set one.

However, this is not health status; it's knowing that the process is sending mqtt keepalives and has not closed the connection.

Health status needs something more, and I think that should be a health_topic, and maybe a health_interval, maybe just default to 600s, probably have health_interval, and a procedure should get called to make a json dict with status about a lot of things, time, maybe rtl_433 uptime, status about the IQ sample stream (# samples since start, ok/not, ?). There is something kind of like this sent with -F json, and maybe that should just be promoted to work always.

Then, in HA, one can have a binary sensor "rtl_433 ok" which times out to false if this message does not arrive, and is also false if some top-level ok dict member is falsy instead of truthy. I do something slightly like this with eSP8266 nodemcu temp sensors: I have a "connectivity" binary sensor that watches the lwt/availablity topic and a "reporting" sensor that goes false if temperature reports stop arriving. Somtimes the sensor stays connected but I get no new data, due to a broken or buggy BME280 (not sure which).

The whole health thing should be a separate issue/PR, I think.

gdt avatar Aug 07 '22 00:08 gdt

I agree that true health checking would need something more comprehensive, but simple connection health checking would also address 99% of my use case, hence my interest in the LWT :) If we're not at an impasse, what are the next steps? Seems like this issue has been open for close to two years now...

mag1024 avatar Aug 08 '22 01:08 mag1024

The pull request has conflicts. It needs to be rebased to the current code, and then it can be seriously reviewed, including seeing if it is aligned with the discussion that's happened.

I personally do not use the built-in MQTT support; I use the relay python script. And, I am not using lwt, but instead have configured a binary sensor in home assistant to report that each device is not reporting. If they are, I know everything is ok. So that's why I personally am not digging in to do this, but am just making architectural comments.

gdt avatar Aug 08 '22 13:08 gdt

maybe a simple “not available” would be a good start. keeping it simple via LWT only

On Apr 25, 2022, at 9:01 AM, Christian W. Zuckschwerdt @.***> wrote:

No, there is no clear idea what "connection status" would mean. I.e. how to convey shutdown, crash, network problem, never started, ...

— Reply to this email directly, view it on GitHub https://github.com/merbanan/rtl_433/pull/1547#issuecomment-1108758872, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAOYTRYDYKQDQGK362CG4TVG26U3ANCNFSM4THJPXCA. You are receiving this because you commented.

fgonza2 avatar Oct 11 '22 08:10 fgonza2

A simple last will message works pretty nicely, for a simple solution. What is the hold up on this one?

XtremeOwnageDotCom avatar Feb 08 '23 18:02 XtremeOwnageDotCom

As an external reference, zigbee2mqtt uses a state topic with a payload of offline. It works pretty well. You can kill the zigbee2mqtt process, and home assistant will instantly pick up all of the unavailable entities.

https://github.com/Koenkk/zigbee2mqtt/blob/125ca61812a3fd47834943265777bfdabff600d1/lib/mqtt.ts#L26-L33

I think if rtl_433 supports a configurable will topic, then we can set the topic to one that works for HA in the https://github.com/merbanan/rtl_433/blob/master/examples/rtl_433_mqtt_hass.py.

I've resolved conflicts at https://github.com/deviantintegral/rtl_433/tree/lastwill. It builds and runs but I haven't tested it yet beyond that.

deviantintegral avatar May 11 '23 11:05 deviantintegral

I guess we are still figuring out the exact wording. Thanks for finding and showing some how other projects solve this. We already use the word "state" in a different context though, there is the "states" topic. At least in terms of HA the concept is that of "availability topic" with "online" / "offline" values.

I'd rather not a have such a basic functionality configurable. There should be a sensible and generally accepted/supported topic name? It's not "lwt" and not "state", is it maybe "availability" or "available"?

zuckschwerdt avatar May 11 '23 11:05 zuckschwerdt

MQTT defines a protocol for writing values to topics, and a way to have a lwt which will publish a specfied value to a specified topic. The rest of the actual protocol is then an agreement between the MQTT publisher and the MQTT subscriber. The availability topic name and the choices of values have to be agreed upon between the publisher, rtl_433, and whatever is reading them. So if they are fixed on one side, they have to be configurable on the other. Overall, I would say that if the data topic names are configurable in rtl_433, the availability topic and payload values should be too.

A key point I don't think we've discussed yet is the scope of the availability topic and if it is about the mqtt connection or if there is something broader. Consider a world with 2 sensors and one rtl_433 process (one frequency). That opens an MQTT connection, posts "true" to rtl_433/foo/online (picking a convention arbitrarily for this example), and then posts json to rtl_433/foo/sensorA and rtl_433/foo/sensorB when frames are decoded. If sensorA stops transmitting, then it just won't post anything, but the online topic won't change. But if the net connection breaks, or the rtl_433 process or machine crashes, then the online topic will get "false" after timeout. So the consumer has to configure that topic and the two values for each of the sensors. They also have to recognize timeouts. (HA can do all of this.)

A wording nuance is that "available" is a property of the entity backed by a specific topic (e.g. sensorA) and "online" is a metadata about the thing that is sending information about A. One configures in HA an "availability topic" with true/false values to read that metadata and make it be a property of the entity.

As I said earlier, I find HA's defaults messy, because I think one should use the same ON/OFF values used in reading binary sensors to read the availability_topic values, because one can reasonably use the "is the thing that sends sensor foo connected" as a valid binary_sensor that one might wish to display and alert on, in addition to being something that will cause foo to show as unavailable.

So in summary, I think it's ok to not have availability topic name and values configurable only if we think everything that will consume them can be configured.

And if we don't make them configurable, then I think Home Assistant is the majority of users, and we should align to the defaults for availbility_topic on/off keywords.

As for topic name, I prefer /online as parallel to the prefix, so one would have

  • foo/bar/online
  • foo/bar/sensorA/sensorA_type/otherA

and so on. The point is that the online topic is about the rtl_433 instance, not about any particular sensor. It is definitely important that each instance be configured to havea unique prefix so that data consumers can tell them apart and in particular not mis-attribute online topics.

I think it's a mistake to send online topics per sensor and keep track of that. And there is only one lwt slot per connection -- because is is meant to be about the connection.

I will try to have a look at the code but it may take me a few weeks as I am very busy right now. Feel free to bug me in 2 weeks.

gdt avatar May 11 '23 13:05 gdt

It looks like this is (at least) the "Shelly"-style of LWT, i.e. rtl_433/HOSTNAME/online of true or false. Importantly we are not inventing the N+1 standard for LWT then. I like payload values that are self-describing like [Oo]nline/[Oo]ffline but adopting an existing scheme is better.

We might not have a choice and need to allow the topic to be configurable as we don't have a MQTT "prefix" currently, only the individual events/states/devices topics.

zuckschwerdt avatar May 11 '23 13:05 zuckschwerdt

I think we should align to Home Assistant rather than some (albeit reasonable and common) vendor thing. The documentation is at https://www.home-assistant.io/integrations/binary_sensor.mqtt/ and amazingly enough, the default payload values are "online" and "offline", just as you said you preferred because they are obvious. (I have no objection to any of this being configurable.)

I suppose we could look at OpenHAB, which is the only other Free Software large-scale MQTT consumer (that would care about availability topics) that I know of. But I do not perceive that there are anywhere near as many users -- we don't have any chatter about "how do I connect rtl_433 to OpenHAB" in any project channels.

gdt avatar May 11 '23 16:05 gdt

Very good. I've updated this PR to match that proposal.

With OpenHAB I guess most people would use https://github.com/sheilbronn/rtl2mqtt and thus won't be affected by our MQTT decisions.

zuckschwerdt avatar May 11 '23 17:05 zuckschwerdt