ansible_influx_downsampling Idea: Dynamically generate CQ/BF SELECT queries based on source database

[First, thanks writing this code/project! I had a similar use-case and problem: using Telegraf+InfluxDB stack to do monitoring and wanted to downsample data for different retention lengths. This project took care of most of the downsampling details for me, because you had already worked-out all the details. Cheers!]

For my Telegraf monitoring setup, I had a few more Telegraf plugins enabled than what you had pre-configured in the defaults/main.yml >> ansible_influx_queries list. I found it a bit cumbersome to have to manually build the mapping table of measurements -> SELECT query syntax.

After reading through the code, I had the idea that it might be neat to auto-generate the SELECT queries by dynamically building the list of measurements from the source Influx database. For example, there's already code in tasks/influxdb_measurement.yml to "Get fields from SOURCE". Based on the data-type of each measurement, we can dynamically generate the SELECT query. Example: for numeric-type fields, default to the mean aggregation-function; for string-type or boolean-type fields default to the mode aggregration-function. And then defaults/main.yml could have a sparsely-populated table of "override" agg-functions for specific measurements -- e.g. some of the measurements are counters and we want to use the max agg-function.

I worked up the above idea and got it working for my own personal use-case. Does this idea seem interesting to you (or anyone else monitoring this repository)? If so then I can work-up a pull-request and send that your way.

Cheers!

Sep 08 '21 23:09 tonyduckles

Hey @tonyduckles , I'm really glad this project helped you! I've put in a lot of effort and was hoping it could be of use for others. So thanks for giving me that feedback!

I like the idea very much, also because that would potentially also allow to not miss some metrics in aggregation.

I'd love to get a PR and test/review it! I think the only requirement would be that manually added queries take precedence and/or the feature can be optionally enabled.

I'm not sure what a good default would be for this feature.

feature on: users may get unexpected results as a default aggregation may not fit the use case -> this may come up late - and then a lot of data is lost
feature off: user may miss data in the aggregation databases -> this should surface rather quickly after setup and force the user to define queries or enable the features. Without deeper thought, I'd say this would be the way to go.

Thank you for your input!

Sep 10 '21 20:09 DrPsychick

[Sorry, this fell over my radar. Swinging back to this now!]

FYI, I pushed my assorted work-in-progress changes to a "develop" branch in a fork of your repo: tonyduckles/develop.

There are some assorted (unrelated) changes lurking in that "develop" branch, but the relevant changes for this enhancement-idea-thread are all in this commit: Dynamically generate CQ/BF SELECT queries based on source database.

Some notes/thoughts:

For my original purposes, I made various non-backwards-compatible code changes. I fully agree that for a final PR we would want to maintain backwards-compatibility. But I wanted to let you take a look at the initial "rough draft" before I spent any more time trying to create a more-official polished PR.
Summary of code changes:
- I basically deprecate the ansible_influx_queries dict entirely in favor of of the new dynamically-generated queries.
- There's a new ansible_influx_mms_fields_aggfunc dict where users can define a (sparsely-populated!) list of override aggregation-functions for specific measurement-fields. I believe I constructed this based on the pre-existing ansible_influx_queries list, based on all the raw-queries which used an aggfunc other than mean. So, this should have maintained backwards-compatibility as far as the baseline default ansible_influx_queries dict is concerned.
- Dynamically generated queries: for numeric-type fields, default to the mean aggregation-function; for string-type or boolean-type fields default to the mode aggregation-function. These defaults have worked well for me for the past ~4 months that I've been using this.
For a "real" PR, I agree we would want to maintain backwards-compatibility for any user-defined ansible_influx_queries entries. Basically, I'm picturing that ansible_influx_queries would shift to be a sparsely-populated override table: if any measurement had an entry in the ansible_influx_queries then we'll use that raw SELECT query; else we can dynamically build the query based on the current measurement based on smart-defaults (see above) and the ansible_influx_mms_fields_aggfunc override table.

Take a look through the commit (at your leisure). Let me know if you have any thoughts/comments/feedback. If this all seems reasonable, then I can spend some more time polishing up the code and submit a PR.

Jan 15 '22 21:01 tonyduckles

looks good to me. You're doing some crazy magic in mm_select_fields 😉

I'd love to have that covered with a test if we manage to do that. That is: a) generated aggregation works b) overrides work (backwards compatibility).

Poke me: slack at drsick.net, if you want to join and chat there.

Jan 16 '22 01:01 DrPsychick

ansible_influx_downsampling ansible_influx_downsampling copied to clipboard

Idea: Dynamically generate CQ/BF SELECT queries based on source database

ansible_influx_downsampling
ansible_influx_downsampling copied to clipboard