signoz [EPIC] Support for Anomaly Detection

[EPIC] Support for Anomaly Detection

Open vanakema opened this issue 2 years ago • 9 comments

Is your feature request related to a problem?

When you have a small team, you want to know when you're app is misbehaving, with a little intervention as possible

Describe the solution you'd like

SigNoz integrates an open source anomaly detection library, to alert users if anything gets out "normal" range

Some usecase:

Abnormal latency (latency spiking) on certain DB queries
Abnormal latency (latency spiking) on certain flask endpoints
Abnormal error rate on certain endpoints
Abnormal requests/s

Describe alternatives you've considered

Really the only alternative would be manually creating alerts in Promethease or feeding SigNoz metrics into an anomaly detection library ourselves

Additional context

The DataDog WatchDog feature is great because of the automatic detection of anomalous behavior, and is really helpful when you have a small team, or a team without a dedicated SRE person, since you no longer have to know what to look for necessarily.

Thank you for your feature request – we love each and every one!

Sep 16 '21 18:09 vanakema

Figured this might be a helpful repo for reference https://github.com/rob-med/awesome-TS-anomaly-detection

Sep 16 '21 18:09 vanakema

Thanks @vanakema for detailing out the use cases. Anomaly detection IS in our roadmap - but a few months down the line.

Curious, what sort of algos worked best for you for detecting "abnormal" values? Does a simple threshold rolling average works good enough or more advanced algos like seasonal pattern detection etc. are needed

Sep 16 '21 18:09 pranay01

Gitlab has written about basic anomaly detection using Prometheus rules using z-score and seasonality. https://about.gitlab.com/blog/2019/07/23/anomaly-detection-using-prometheus/

Such sort of things would be possible with SigNoz also as we plan SigNoz to be compatible with Prometheus rules and alertmanager.

Sep 17 '21 04:09 ankitnayan

We can also leverage Third Eye

This is built for Apache Pinot which an OLAP database similar to ClickHouse

Jul 26 '22 11:07 pranay01

Might be worth while asking the netdata team on lessons learnt applying ML to time series.

Jul 19 '23 21:07 nwmcsween

Thanks for the note @nwmcsween Do you think Netdata does a good job applying ML to time series data? Any blogs/issues where they share more about it?

Jul 20 '23 11:07 pranay01

@pranay01 Namaste Especially ML and alarms is the specialty of netdata. It's worth it to have a look at it. I speak from 30 years of experience with Nagios, Zabbix, Elastic, Opensearch, Influx, and many more including Netdata. Netdata is top-heavy more on *nix than on Windows and lacks otel integration. That's why I'm looking at you guys right now. 😃

Jul 29 '23 19:07 StefanSa

Thanks @StefanSa - do you have relevant docs in NetData I should look at?

Jul 30 '23 11:07 pranay01

@pranay01 Certainly not a problem. There is a lot of reading material here, as said alerting is also well done there.

ML: https://learn.netdata.cloud/docs/ml-and-troubleshooting/machine-learning-ml-powered-anomaly-detection

https://learn.netdata.cloud/docs/ml-and-troubleshooting/anomaly-advisor

https://learn.netdata.cloud/docs/visualizations/netdata-charts#anomaly-rate-ribbon

https://learn.netdata.cloud/docs/ml-and-troubleshooting/metric-correlations

https://www.youtube.com/watch?v=2gJ36YuW6Ko

Alerting: https://learn.netdata.cloud/docs/alerting/

Live Demo: Live-Demo

Jul 30 '23 17:07 StefanSa

signoz signoz copied to clipboard

[EPIC] Support for Anomaly Detection

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Thank you for your feature request – we love each and every one!

signoz
signoz copied to clipboard