weewx-mqtt icon indicating copy to clipboard operation
weewx-mqtt copied to clipboard

Aggregate values

Open roe-dl opened this issue 2 years ago • 17 comments

** 2nd trial after resolving conflicts with other PRs **

Several times users asked for the possibility to output aggregated values such as maximum or minimum values by MQTT. This PR adds this feature by introducing a section [[[calculations]]]. Within this section aggregated values can be defined similar to the aggregation tags for the Cheetah generator.

Example:

[StdRESTful] ... [[MQTT]] ... [[[calculations]]] maxTemp = day.outTemp.max minTemp = day.outTemp.min This defines the names maxTemp, which is the daily temperature maximum, and minTemp, which is the minimum. Those values are then included in the MQTT output.

Unit groups are assigned automatically.

There is also a special case: dayRain. As this observation type is originally defined to be send to WU, it is calculated to be inclusive at both ends of the interval (see remark in restx.py). That is not appropriate for the MQTT output and results in wrong values for example within the Belchertown skin. Using dayRain = day.rain.sum dayRain is re-defined and re-calculated to be exclusive at the left end, and that re-calculation is valid for MQTT only. That's why this is the default, if no section [[[calculations]]] is present.

See also https://github.com/poblabs/weewx-belchertown/issues/685

roe-dl avatar Sep 09 '22 12:09 roe-dl

See #31 for older comments.

roe-dl avatar Sep 09 '22 13:09 roe-dl

@ThomDietrich

Does your addition change anything for users without [[[calculations]]] in their config?

No. Section [[[calculations]]] is not necessary. For people who do not need it, nothing changes.

roe-dl avatar Sep 09 '22 13:09 roe-dl

@ThomDietrich

Aggregation should (arguably) happen after MQTT.

There are certainly people who would argue that. If you have some Software at the other end, that can aggregate, then be happy. But that does not apply to all people. Another thing is that MQTT is not reliable in that sense, that you can be sure that the recipient receives all messages sent. If - for example - the aggregation is "sum" then, what do you get? So I guess, aggregation should happen before MQTT to get reliable results.

See also: https://github.com/matthewwall/weewx-mqtt/pull/31#issuecomment-1189721222 https://github.com/matthewwall/weewx-mqtt/pull/31#issuecomment-1238939874

roe-dl avatar Sep 09 '22 13:09 roe-dl

Another thing is that MQTT is not reliable in that sense, that you can be sure that the recipient receives all messages sent. If - for example - the aggregation is "sum" then, what do you get? So I guess, aggregation should happen before MQTT to get reliable results.

Maybe I got this wrong. You want rain to be aggregated, which is why there is the published topic dayRain (aside the problem that you state it is buggy)?

ThomDietrich avatar Sep 09 '22 15:09 ThomDietrich

There are several observation types in WeeWX that contain aggregated values. They are calculated in restx.py That is in the central part of WeeWX. Regarding rain there are hourRain, dayRain and rain24. And they are widely used in skins.

If you want to display live data on the web you use the MQTT extension to transfer the actual readings including those aggregations to the client and process them by Javascript. And I guess, you want to do as least calculations as possible at the client's side.

roe-dl avatar Sep 09 '22 15:09 roe-dl

i find the use of 'calculations' confusing. there is a 'Calculations' section in the weewx configuration file, and is completely different from what you have implemented for 'calculations'. perhaps your implementation would be more appropriately called 'aggregations'?

my first reaction to this implementation is that it does not belong here. the mqtt extension should simply upload whatever observations are in the loop and/or archive dictionaries. so if someone wants an aggregation, s/he should do that aggregation either somewhere in weewx before mqtt uploading happens, or downstream in an mqtt consumer (i can imagine use cases for both of these).

the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations (e.g., a very specific 'dayRain' for WU, or 'dayRain' and 'hourRain' for WOW). mqtt is generic, so there is no specification to meet.

having said that, i'm not sure what an implementation for 'do the aggregation before mqtt uploading' looks like. i would prefer to see at least one or two attempts at that before bogging down the mqtt uploader with additional complexity. note that for some hardware, you do not have to do anything - some of those aggregations already appear in the archive records!

also, you cannot assume that everyone will have 'rain' as an observation. will your implementation try to insert dayRain even if no 'rain' is recorded? what about loop vs archive? remember that no observation is not the same as an observation with value 'NULL', and that is different from an observation with value 0.

i am not a python expert, but from what i have seen, it is poor form to have a catch-all for a bunch of different exception types. better to catch the exceptions where they might occur. that also forces you to test the code and understand exactly what each part of the code might or might not do. this is particularly true for ValueError and TypeError - i can see why you might end up with the other exceptions in a single catch, but having ValueError and TypeError in there indicate that you do not really know how the code might behave. it is laudable to try to not have weewx die, but not by swallowing exceptions.

matthewwall avatar Sep 10 '22 02:09 matthewwall

@matthewwall Thank you for commenting on this.

i find the use of 'calculations' confusing.

Ok. Done.

my first reaction to this implementation is that it does not belong here. ...

I understand the point. So what were my thoughts?

  • I see no way to insert some extension between restx.py in the WeeWX kernel and the MQTT extension.
  • And I looked for a more general solution for the Belchertown rain bug than introducing a second daily rain observation type with a different calculation.

the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations (e.g., a very specific 'dayRain' for WU, or 'dayRain' and 'hourRain' for WOW). mqtt is generic, so there is no specification to meet.

I agree with you. I see it that way: People can use the MQTT extension to feed whatever application they use. And for me, that requires the possibility to configure the MQTT extension to meet that application's needs.

will your implementation try to insert dayRain even if no 'rain' is recorded?

No.

what about loop vs archive?

The aggregation is only added if a reading of the observation type to be aggregated is present. Otherwise nothing is changed. No None value is emitted.

better to catch the exceptions where they might occur.

I will change that as two people spoke about that

Please note: Calculating an aggregation can call other people's extensions. And I cannot know how they are implemented and whether they are free of bugs or unhandled exceptions. There is no way to find out all the exceptions the call of weewx.xtypes.get_aggregate() can produce. Everyone can extend that function by their code.

roe-dl avatar Sep 10 '22 07:09 roe-dl

@matthewwall

I am using weewx-mqtt with this PR. What would I have to do without this PR to get the following values as MQTT output in weewx-mqtt Topic?

Example, day totals: rain, sunshineDur (sunshine per interval in seconds), windrun ...

Thanks Henry

Edit:

"the mqtt extension should simply upload whatever observations are in the loop and/or archive dictionaries."

The extension can still do this, right? [aggregations] is optional

"so if someone wants an aggregation, s/he should do that aggregation either somewhere in weewx before mqtt uploading happens, or downstream in an mqtt consumer"

Why simple when complicated also works? This PR expands the possibilities of the extension and makes it user-friendly. I always thought we Germans are complicated :-)

"the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations"

My destination requires this aggregation too.

hoetzgit avatar Sep 10 '22 10:09 hoetzgit

@roe-dl Have you given any thoughts to writing a general service to ‘aggregate’ observations and augment either loop packets or archive records? I am also of the mindset that this doesn’t belong with loop data. My plan is to write a service to augment archive records. My user interface/skin will pull ‘real time’ data from the MQTT loop data and the ‘aggregate’ from MQTT archive data. I’m just wondering if I am missing something with this approach. Thanks!

bellrichm avatar Sep 10 '22 13:09 bellrichm

another reason for doing aggregation in a separate service is to avoid duplication. for example, if i upload to two different mqtt servers, i should not have to create duplicate aggregation stanzas. similarly, if i want day/week aggregations in both mqtt and influx uploads, you don't want those aggregation queries to happen in each uploader, since that results in double the database queries.

on the other hand, one could argue that aggregation is something that should happen as far down the pipeline as possible (e.g., in the weewx reports).

i don't have an answer yet, but it is healthy to explore the options before committing to an implementation.

matthewwall avatar Sep 10 '22 14:09 matthewwall

@bellrichm Yes, I thought about that

  • A general service cannot change the dayRain value calculated in restx.py and so cannot solve the Belchertown rain bug. And it is also not possible to solve that bug within the Belchertown skin.
  • A general service would influence all the skins and restful services. It cannot be restricted to MQTT output. That can be desired or can be a problem, depending on the user's needs.
  • In general there is no need to have such aggregations for the Cheetah generator. If the aggregation would be calculated within a general service it would influence the Cheetah generator, too. So that wouldn't be the right place to do the aggregation, too.
  • There is no problem with LOOP data, because only those aggregated values are updated that have the "un-aggregated" values present in the data packet, too.

roe-dl avatar Sep 10 '22 14:09 roe-dl

@matthewwall

for example, if i upload to two different mqtt servers

I was not aware that this is even possible.

on the other hand, one could argue that aggregation is something that should happen as far down the pipeline as possible (e.g., in the weewx reports).

I would see it that way.

I had some different approaches tested with my installation before sending this PR. I found that this solution was the simplest and most general of them.

roe-dl avatar Sep 10 '22 14:09 roe-dl

if it turns out that 'aggregations' belongs in the mqtt extension, then that implementation (and the config syntax) probably should be pushed into the weewx core so that it could be use by any other uploader. in that respect, this could be analogous to the 'sensor_map' construct for drivers, or the name/units mappings used by many uploaders with user-configurable mappings. in those instances, we have used a standard syntax for the config definitions, so that eventually we can push the implementation code into the weewx core (if appropriate).

matthewwall avatar Sep 10 '22 14:09 matthewwall

I am not sure if Tom would like that idea. But I would agree with you.

roe-dl avatar Sep 10 '22 14:09 roe-dl

@roe-dl Thank you. For my use, I don’t think I have those concerns. But, it something for me to think about. Thanks again.

bellrichm avatar Sep 10 '22 21:09 bellrichm

i just realized that 'aggregation' is already used - it indicates whether to send one observation per mqtt message, or send all observations in a single message.

so the keyword 'aggregations' would still work, but it is rather similar, and might cause confusion.

i also just realized that i already built a mechanism in the mqtt uploader (and other uploaders) for adding derivative observations: augment_record. this would be a more appropriate keyword that 'aggregations', since it can contain any derivative value, not just aggregate values. when enabled, it uses the get_record method. this is the appropriate way to augment, and @roe-dl has done exactly that. however, it could be done generically by any restful uploader by defining the list of observations in an augment_record stanza in the configuration file. that way it could be used by any uploader, and could be selectively modified per uploader, with no need to change code. xtypes syntax makes sense here.

my initial take is that it is a design mistake that a base class implementation should contain implementations specific to a derived class, i.e., the definition of dayRain should be in a leaf implementation of get_record, not in the trunk. i will have to take this up with tom, and figuring out a backward-compatible-yet-still-correct implementation could take a bit of work and testing.

matthewwall avatar Sep 11 '22 11:09 matthewwall

@matthewwall Is there any news in that case?

May be, WeeWX version 5 is the possibility to introduce a little breaking change as you thought about in your last comment.

roe-dl avatar Aug 21 '23 19:08 roe-dl