ansible_influx_downsampling
ansible_influx_downsampling copied to clipboard
Idea: Dynamically generate CQ/BF SELECT queries based on source database
[First, thanks writing this code/project! I had a similar use-case and problem: using Telegraf+InfluxDB stack to do monitoring and wanted to downsample data for different retention lengths. This project took care of most of the downsampling details for me, because you had already worked-out all the details. Cheers!]
For my Telegraf monitoring setup, I had a few more Telegraf plugins enabled than what you had pre-configured in the defaults/main.yml >> ansible_influx_queries list. I found it a bit cumbersome to have to manually build the mapping table of measurements -> SELECT query syntax.
After reading through the code, I had the idea that it might be neat to auto-generate the SELECT queries by dynamically building the list of measurements from the source Influx database. For example, there's already code in tasks/influxdb_measurement.yml to "Get fields from SOURCE". Based on the data-type of each measurement, we can dynamically generate the SELECT query. Example: for numeric-type fields, default to the mean aggregation-function; for string-type or boolean-type fields default to the mode aggregration-function. And then defaults/main.yml could have a sparsely-populated table of "override" agg-functions for specific measurements -- e.g. some of the measurements are counters and we want to use the max agg-function.
I worked up the above idea and got it working for my own personal use-case. Does this idea seem interesting to you (or anyone else monitoring this repository)? If so then I can work-up a pull-request and send that your way.
Cheers!
Hey @tonyduckles , I'm really glad this project helped you! I've put in a lot of effort and was hoping it could be of use for others. So thanks for giving me that feedback!
I like the idea very much, also because that would potentially also allow to not miss some metrics in aggregation.
I'd love to get a PR and test/review it! I think the only requirement would be that manually added queries take precedence and/or the feature can be optionally enabled.
I'm not sure what a good default would be for this feature.
- feature on: users may get unexpected results as a default aggregation may not fit the use case -> this may come up late - and then a lot of data is lost
- feature off: user may miss data in the aggregation databases -> this should surface rather quickly after setup and force the user to define queries or enable the features. Without deeper thought, I'd say this would be the way to go.
Thank you for your input!
[Sorry, this fell over my radar. Swinging back to this now!]
FYI, I pushed my assorted work-in-progress changes to a "develop" branch in a fork of your repo: tonyduckles/develop.
There are some assorted (unrelated) changes lurking in that "develop" branch, but the relevant changes for this enhancement-idea-thread are all in this commit: Dynamically generate CQ/BF SELECT queries based on source database.
Some notes/thoughts:
- For my original purposes, I made various non-backwards-compatible code changes. I fully agree that for a final PR we would want to maintain backwards-compatibility. But I wanted to let you take a look at the initial "rough draft" before I spent any more time trying to create a more-official polished PR.
- Summary of code changes:
- I basically deprecate the
ansible_influx_queriesdict entirely in favor of of the new dynamically-generated queries. - There's a new
ansible_influx_mms_fields_aggfuncdict where users can define a (sparsely-populated!) list of override aggregation-functions for specific measurement-fields. I believe I constructed this based on the pre-existingansible_influx_querieslist, based on all the raw-queries which used an aggfunc other thanmean. So, this should have maintained backwards-compatibility as far as the baseline defaultansible_influx_queriesdict is concerned. - Dynamically generated queries: for numeric-type fields, default to the
meanaggregation-function; for string-type or boolean-type fields default to themodeaggregation-function. These defaults have worked well for me for the past ~4 months that I've been using this.
- I basically deprecate the
- For a "real" PR, I agree we would want to maintain backwards-compatibility for any user-defined
ansible_influx_queriesentries. Basically, I'm picturing thatansible_influx_querieswould shift to be a sparsely-populated override table: if any measurement had an entry in theansible_influx_queriesthen we'll use that rawSELECTquery; else we can dynamically build the query based on the current measurement based on smart-defaults (see above) and theansible_influx_mms_fields_aggfuncoverride table.
Take a look through the commit (at your leisure). Let me know if you have any thoughts/comments/feedback. If this all seems reasonable, then I can spend some more time polishing up the code and submit a PR.
looks good to me. You're doing some crazy magic in mm_select_fields 😉
I'd love to have that covered with a test if we manage to do that. That is: a) generated aggregation works b) overrides work (backwards compatibility).
Poke me: slack at drsick.net, if you want to join and chat there.