ApplicationInsights-dotnet icon indicating copy to clipboard operation
ApplicationInsights-dotnet copied to clipboard

Support percentiles for aggregated metrics

Open andyvig opened this issue 5 years ago • 10 comments

<<I suspect this qualifies as an enhancement request>> Per this StackOverflow answer, it’s not possible to do percentiles on aggregated metrics sent through AppInsights. https://stackoverflow.com/questions/58124268/how-to-do-percentiles-on-custom-metrics-in-azure-appinsights

The request is to support this in some form, since it seems like a significant miss relative to other platforms like Prometheus. Is there any workaround other than sending telemetry for every metric measurement (since that won’t scale at all)?

I would love not to have to set up Prometheus/Grafana infrastructure to support this. Thanks!

andyvig avatar Sep 30 '19 18:09 andyvig

@vgorbenko Is this something in Metrics roadmap.. ?

cijothomas avatar Sep 30 '19 19:09 cijothomas

Any indication of how close this might be?
For high-volume scenarios it makes AppInsights unusable for metrics (since simple averages won't cut it for production monitoring). If there's a solution AppInsights provides here that I'm missing please let me know (our plan is to track aggregate metrics for billions of events/day).

andyvig avatar Oct 14 '19 16:10 andyvig

@andyvig This is not planned for 2019. I will check and report back the plan for next semester. (2020). I also know that its possible for you to write custom aggregator and plug into rest of metrics pipeline if you want to do percentlies. Its not documented, but if you want to take a look, heres where to start looking: https://github.com/microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Metrics/Extensibility/MetricSeriesAggregatorBase.cs

cijothomas avatar Oct 14 '19 18:10 cijothomas

Thanks @cijothomas, how would we then query that on the Log Analytics side? Does the percentile function support aggregate data?
I'm looking for something similar to this operation in Prometheus: "To calculate the 90th percentile of request durations over the last 10m" histogram_quantile(0.9, rate(http_request_duration_seconds_bucket[10m])) From https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile

andyvig avatar Oct 14 '19 20:10 andyvig

Don't think there exists any native support as schema dont have anything for storing percentiles.https://github.com/microsoft/ApplicationInsights-dotnet/blob/develop/src/Microsoft.ApplicationInsights/Extensibility/Implementation/External/DataPoint_types.cs

You'd need to store quantiles as customProps, and do custom queries to get them, as Analytics wont understand customProps.

@SergeyKanzhelev even if one authors own aggregator, any way to store quantiles (.1,.5..9 etc) in schema?

cijothomas avatar Oct 15 '19 03:10 cijothomas

Any news / roadmap item / documentation / customer guidance of

  • publishing metrics as histograms to AppInsights
  • with the goal of using percentiles in Queries/Views/Alerts

to make AppInsights a good fit for SLOs?

RicardoNiepel avatar Nov 26 '20 13:11 RicardoNiepel

No work is planned to add support for this in ApplicationInsights SDK.

The Metrics support in OpenTelemetry is coming by end of 2021 (nov 2021) - https://github.com/open-telemetry/opentelemetry-dotnet/issues/1501. After the OpenTelemetry part is shipped, there'd be a supported way to export metrics to ApplicationInsights, but no solid dates for this. Also no solid date for supporting percentiles/histogram in ApplicationInsights.

cijothomas avatar Nov 26 '20 19:11 cijothomas

This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jan 04 '22 00:01 github-actions[bot]

@cijothomas Checking in here...
This still a "maybe sometime in the future but all dates unknown" situation or is there any more definition around if/when this might be supported? Thanks.

andyvig avatar Jan 04 '22 02:01 andyvig

No firm dates that I can share. (the feature requires not just SDK support, but backends/UI etc.). From SDK side, this will likely come via OpenTelemetry route, and not from this repo.

cijothomas avatar Jan 04 '22 17:01 cijothomas

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

github-actions[bot] avatar Nov 01 '22 00:11 github-actions[bot]

We still don't have a clear statement, if and how this will come.

If this feature is not coming in the Azure Monitor / AppInsights backend and various SDKs, there should be some guidance published, how these technologies could be used if someone wants to follow SRE best practices:

RicardoNiepel avatar Nov 02 '22 11:11 RicardoNiepel

Hi Ricardo, Azure Managed Prometheus (Preview) was announced last month and is available with Azure Managed Grafana integration. This is compatible with Prom Client.

https://learn.microsoft.com/azure/azure-monitor/essentials/prometheus-metrics-overview

Additionally we are working on supporting percentiles via the OpenTelemetry histogram API. Unfortunately this work requires some major changes in how our backend works and thus any release is likely 6+ months out.

CC: @vishiy

mattmccleary avatar Nov 02 '22 17:11 mattmccleary

Thanks a lot for clarification and details around workarounds/other possibilities.

RicardoNiepel avatar Nov 03 '22 13:11 RicardoNiepel

This issue is stale because it has been open 300 days with no activity. Remove stale label or this will be closed in 7 days. Commenting will instruct the bot to automatically remove the label.

github-actions[bot] avatar Aug 31 '23 00:08 github-actions[bot]

@mattmccleary can you provide an update on this? Thx

RicardoNiepel avatar Sep 21 '23 21:09 RicardoNiepel

@mattmccleary any update on percentile tracking support?

I'm trying to use 'Azure.Monitor.OpenTelemetry.Exporter' to collect and report on our application latency. Looks like currently it tracks things like max value, but it's not very useful for practical purposes since max value can be influenced by a lot of external factors and doesn't necessarily provide an accurate view of how the app is doing. Ideally we want to track the 99-th percentile of this latency value, but I can't figure out how to do that or if it's supported at all.

dennis-yemelyanov avatar Dec 03 '23 21:12 dennis-yemelyanov