loudml
loudml copied to clipboard
Applying single model to InfluxDB measurement with multiple series
Hi,
Many times there is the need to detect anomalies on top of multiple series of the same measurement, for example bandwidth utilisation of all ports of a network switch, or even the switch name may be a tag. From what I understood from the docs, you can only define tag k/v pairs as filters to select a specific series, but if LoudML could train a model for each series of the measurement depending on a configured set of tags (e.g. all values of tags x and y, or an array of tags values), and output accordingly, I think it would be a great addition.
Thanks.
Hi Rodrigo,
thanks a lot for this proposal.
If we make the following additions, will it match your requirements?
loudml train --tags tag1,tag2,....
- add a --tags option (Or -T) to loudml train command in the CLI
- then, use all possible key,pair values for this tag, to train a single model
- apply this single model to live data, with extra options in the CLI for predict command to select given key/value pairs?
- and finally, tag the output data points (prediction_*) with the same tag names and values as the original series
In Chronograf, this training behaviour would be triggered automatically by using the 1click ML feature and clicking one or more "Group By" buttons, in blue:
Let us know your comments.
Hi Sebastien,
Looks like a sound approach, and I like the Group By, but in a dynamic environment I don’t want to be concerned with setting explicit tag value filters. That is of course very useful for many cases, but depends on the objective. For example, nodes can come and go, so the values of the node tag will change over time. At the same time I would be interested in a separate prediction/forecast for each node, on a single field key. That would be a group by without any where on the tags. From your outline, I’m not sure if that would be considered one or multiple models (from a licensing perspective).
Thanks!
I think I have a similar use case as the one @voiprodrigo describes. I'm storing under the same measurement values with different tags, for instance: job_id
, locale
, error_count
(value). In this case, training a model for the entire measurement is not ideal, because each job_id
behaves independently of the rest and could introduce deviations.
Using the tags
filters would mean the need to manually specify each job_id
-locale
combination and train different models (1 per combination).
What I would like to have is the option to group each job_id
-locale
combination without manually specifying the values in the tag
section.
I guess, that extending the previous proposal to allow to specify several tag keys at the same time and generating the model & tagging the output datapoint with the tags combination would allow solving this case. I guess that something like --tags=job_id,locale
?
Sounds like we need to have:
- A wildcard * capability
- and also allow to overwrite the default prediction_{{model_name}} measurement with something else, defined by the user
The training part is more complex. Training a single model for distinct series/with distinct tags assumes they more or less all have a similar pattern. Do you already tag series according to "expected pattern type" or this should be discovered dynamically?
Allowing to override the measurement would be great, even more, if it's possible to interpolate the values of the tags in the title (or concatenated at the end). Like: prediction_avg_error_count_{{job_id}}_{{locale}}
(for my case).
Considering that one single model assumes a similar pattern/behaviour, perhaps allowing to generate N dynamic models (generated by the combination of tags) could be a better fit? So if you say something like --tags=job_id,locale
then 1 model is going to be generated by each combination of these tags. It would be ideal if is reported a single model (perhaps) although underneath you've several models that are evaluated individually.
But to be honest even if each model is generated individually it would be ok, because the model could've the match_all
section already configured and would know how to filter the measurement.
It's a complex one. Trying to list what will be needed:
- [ ] Having a custom measurement name, rather than the default prediction_* name
- [ ] Using template {} values in this measurement name, eg {model}, and so on.
- [x] Tagging the output measurement with tags. We have, at least, a partial solution implemented in 1.4.0
- [ ] Model templating, with wildcard capabilities
- [ ] Training, and therefore inference, for specific tag values
@regel Hi, my measurements have differents values inside that are identified by a specific tag. For example I have a measurement kwh
with a tag _id
that identify the id of the device that sent those value. So to predict consumption of a specific device I have to trai the model by specify to query only values with that specific tag _id=xxxx
, Is this feature already available now or is still a work in progress?
I really wonder why LoudML selected bucket option with the data, when database query would have been the KIS solution. Like timeseries databases are great for combining data and now I cannot have it.
For example I've customer CH with multiple (3) network interfaces in active-active-standby configuration. I am not interested traffic in single interface but total customer CH traffic.
In grafana/influx I'd do with query select sum(ifInOctets) as "total ifInOctets" from interfaceTraffic where ifName=~/^CH.*/ and time > now()-4h GROUP BY time fill(null)
Unfortunately I cannot enter that query to LoudML, LoudML seems to limit unnecessary data mining. From LoudML perspective that returns single feature, one value per time bucket.
So in my mind, ditch the buckets and bucket configuration. Allow direct queries, like LoudML chronograf data explorer you can create your query to be 100% exact for the data you want, and push that to LoudML engine..