xp icon indicating copy to clipboard operation
xp copied to clipboard

Use micrometer for collecting metrics

Open gbbirkisson opened this issue 4 years ago • 13 comments

To be more flexible when it comes to metrics I would like to suggest that we use micrometer.io to collect metrics.

Micrometer provides an abstraction that makes it easy to expose metrics to multiple monitoring services, i.e. datadog, prometheus, elastic and many more.

We currently use Dropwizard to collect metrics, and micrometer can easly wrap that. So the change we need to make to XP now is not so big.

If we implement a "metrics service" with micrometer it would enable developers to add their own custom metrics. In addition we could create apps that ship metrics to all these different monitoring services.

This would also help with pod autoscaling in the cloud since prometheus is the standard for kubernetes cluster and we need to setup internal cluster metrics to enable that feature.

gbbirkisson avatar Mar 12 '20 09:03 gbbirkisson

Sounds like a plan, but why not start by implemeting it as an app, then it can be added to core when it is mature?

sigdestad avatar Mar 12 '20 11:03 sigdestad

This is not an app. App has no access to internals which micrometer needs

rymsha avatar Mar 12 '20 12:03 rymsha

How do we currently control that?

sigdestad avatar Mar 12 '20 12:03 sigdestad

It is on different jetty port and each subsystem reports what it wants to report.

rymsha avatar Mar 12 '20 12:03 rymsha

I think I understand the purpose of the task, but I am not sure if I understand "how" we should do this? What will be the result of such a change? replacing the current API? Something else?

sigdestad avatar Mar 12 '20 12:03 sigdestad

That is a good question. Sadly com.codahale.metrics is part of our core java API. So removing it would be a breaking change. (we wait with this til 8)

Micrometer is not a drop-in replacement either. I have done it before (as many others who migrated from spring boot 1 to spring boot 2) - it was pain.

Good news we use com.codahale.metrics only for collecting stats about Events and (in upcoming release, if we don't remove it) stats from Jetty. Some other simple endpoints as well (threaddump for instance), but they are easy to replicate.

Micrometer is usually not an endpoint (although one can expose collected metrics via http). Micrometer usually "ships" metrics to metrics registry (datadog, Elasticsearch, etc) so it is configuration of XP how often to push metrics and where. While dropwizard collects metrics "silently" and waits for monitoring systems to pull them.

In my mind this is an epic which consists:

  • a replacement of existing metrics API
  • a subsystem which pushes metrics to a selected registry (via configuration)
  • optionaly a new /status endpoint

rymsha avatar Mar 12 '20 19:03 rymsha

So removing it would be a breaking change

Are you sure? Because micrometer provides a wrapper for that, we might not have to break anything.

Micrometer is usually not an endpoint

That depends on the monitoring system you use. I think we might be able to do the exact thing we are doing today with micrometer.

I would say the epic is:

  • Create a Composite Registry Component that all other subsystems and apps throw their metrics at (Replace the metrics API you could say). Make it have a dynamic reference to other Micrometer registries (children) so they can be added and removed.
  • Change all metric collection code to send metrics to the new Composite registry
  • Add a Dropwizard registry as a child to the Composite Registry and expose the metrics with that on our old metrics endpoint. (Should look the same as it is now).
  • Pop the champagne

To sum up: All systems send metrics to the Composite registry. Composite Registry has children registries:

  • Dropwizard (Default, can be turned off): Exposes metrics on :2609
  • Prometheus (App on market): Exposes metrics on :2609/prometheus
  • Datadog (App on market): Pushes metrics to data dog
  • ....

This way both the subsystems creating metrics and the Component registry do not care about how the metrics are shipped, and people can choose how they ship all of their metrics. The can even choose multiple methods.

gbbirkisson avatar Mar 13 '20 08:03 gbbirkisson

I like this approach, especially cool if we could have apps for the various "target" metric systems.

sigdestad avatar Mar 13 '20 08:03 sigdestad

Are you sure?

Yes. Micrometer has DropwizardMeterRegistry wrapper. So one can use dropwizard api, like we do modules/core/core-api/src/main/java/com/enonic/xp/util/Metrics.java
But keeping two metrics APIs is confusing (maybe OK for 7.x, but not for 8.x) Keeping only old one is limiting (tagging conventions fro micrometer are different). Removing old one is breaking.

Change all metric collection code to send metrics to the new Composite registry

This is where it starts to be a problem. Dropwizard is "hardcoded" in our API.

Add a Dropwizard registry as a child to the Composite Registry and expose the metrics with that on our old metrics endpoint.

I found only Events using that metrics internally. What customers expose via API, I'm not sure. Maybe better to have a separate endpoint /metrics and leave /status alone

Prometheus (App on market): Exposes metrics on :2609/prometheus

I don't think it is possible (need to check). 8080 is ok.

rymsha avatar Mar 13 '20 10:03 rymsha

I don't think it is possible (need to check). 8080 is ok.

It is possible, I did it when I first started in Enonic.

@Component
public class PrometheusReporter
    implements StatusReporter
{

    private CollectorRegistry registry;

    @Override
    public String getName()
    {
        return "micrometer.prometheus";
    }

    @Override
    public MediaType getMediaType()
    {
        return MediaType.PLAIN_TEXT_UTF_8;
    }

    @Override
    public void report( OutputStream stream )
        throws IOException
    {
        Writer w = new OutputStreamWriter( stream );
        TextFormat.write004( w, registry.metricFamilySamples() );
        w.close();
    }

    @Reference
    public void setRegistry( final PrometheusRegistry registry )
    {
        this.registry = registry.getPrometheusRegistry();
    }

}

gbbirkisson avatar Mar 13 '20 12:03 gbbirkisson

It is possible, I did it when I first started in Enonic.

Ah, you are right. I thought :2609 always starts with /status. (I just recently implemented Hazelcast "status", that's probably why)

This way both the subsystems creating metrics and the Component registry do not care about how the metrics are shipped, and people can choose how they ship all of their metrics

Sounds great on paper, doesn't always work in practice: https://github.com/micrometer-metrics/micrometer/issues/587 It is fixed for now (and in 7.x we are probably safe), but will appear in future again. So DropwizardRegistry should be replaced with SimpleMeterRegistry in 8.x to prevent version locking.

rymsha avatar Mar 13 '20 13:03 rymsha

Let's do it!

sigdestad avatar Apr 16 '20 11:04 sigdestad

#8086 is currently a blocker to make this as an app

rymsha avatar May 26 '20 16:05 rymsha