consul-template icon indicating copy to clipboard operation
consul-template copied to clipboard

Replace OpenTelemetry with armon/go-metrics

Open findkim opened this issue 4 years ago • 15 comments

Summary

After a few months of weighing the current state of OpenTel vs the current need for metrics wrt to the Consul ecosystem, it has become more evident to me that switching to armon/go-metrics would provide the most cohesive and comprehensive feature set at this time.

Background

PR https://github.com/hashicorp/consul-template/pull/1378 recently introduced metrics to Consul Template using OpenTelemetry. There's a short blurb providing context as to why OpenTel was initially chosen over armon/go-metrics and that it was an experiment towards a new library for telemetry.

Context

I tried to create a uniform UX for configuring and emitting metrics over on Consul ESM #70, another Consul ecosystem project, and quickly ran into some short comings when dealing with more complex metric types than counters and histograms. For example, timer values were reduced to histogram integers, and there currently is not a systematic way to denote measurement (seconds, milliseconds, etc).

The community request for statsd is not quite satisfied yet due to the difference in tag syntax between statsd variants and would require omitting the use of tags as a whole to continue support for statsd.

UX

The metrics reported should not change much from the initial work in https://github.com/hashicorp/consul-template/pull/1378 (not yet released). However, the telemetry configuration will be breaking changes compared to the PR and align closer to how it is supported by Consul (docs).

Switching libraries will add feature support for:

  • statsd, circonus
  • more metric types, like timers and gauges
  • tags and global tags
  • metrics prefix
  • prefix filtering

findkim avatar Jul 21 '20 19:07 findkim

Would it be better to revert #1378 and start at this fresh or convert the opentel code that is already there?

eikenb avatar Jul 22 '20 00:07 eikenb

@eikenb good question, I think reverting would be most clear for tracing this pivot when looking back at the history. The original PR could still be used as a reference when making the new changes.

findkim avatar Jul 23 '20 15:07 findkim

Ok, sounds good. I'll get it reverted. Be aware that, because we'll be reverting a merge, if you continue the work off the reverted code you'll need to revert the revert before merging the new code. See this doc for the details... https://github.com/git/git/blob/master/Documentation/howto/revert-a-faulty-merge.txt

eikenb avatar Jul 23 '20 19:07 eikenb

Opentelemetry merge reverted: https://github.com/hashicorp/consul-template/commit/6bc149b39e69a3686af2786cbf92c57daca1b84e

eikenb avatar Jul 23 '20 20:07 eikenb

Any updates?

Oloremo avatar Nov 22 '20 19:11 Oloremo

Hi @Oloremo, no specifics yet. We got sidetracked launching a new tool which ate all our time for a while. If you'd like this to get priority, please :+1: the original issue. We use those to help gauge community priority and how they figure into our prioritized backlog. Thanks.

eikenb avatar Nov 23 '20 20:11 eikenb

Yeah currently consul-template in a weird state, hashicorp propose it as a critical part of the Nomad infra for example: https://learn.hashicorp.com/tutorials/nomad/vault-pki-nomad yet it's not very clear how to monitor it.

Oloremo avatar Nov 23 '20 20:11 Oloremo

Opps. Didn't click through on that vault-pki-nomad link and mistook it for a different case. So ignore the text below (leaving for context). But please be assured we are working on these things. And thanks for adding the :+1:.

--

Yeah. It is kind of, but we are working on fixing it with the refactoring of the library case out of consul-template. Once done (I'm actively working on this) it should be a big help as it cleans things up immensely for those use cases (the 'using consul-template as a library' cases). You can follow along at https://github.com/hashicorp/hcat if your interested.

eikenb avatar Nov 23 '20 22:11 eikenb

Interesting!

Does that mean that the consul-template will be sunsetted soon?

Oloremo avatar Nov 23 '20 23:11 Oloremo

As a library, yes. As a stand alone application, no. We'll be updating it to use the new library, but it won't be going anywhere as an application at this point. We're considering merging it and envconsul which, once merged, might end up under a new name, but we're not 100% on that yet.

eikenb avatar Nov 23 '20 23:11 eikenb

Thanks for the clarification!

Oloremo avatar Nov 23 '20 23:11 Oloremo

So it missed the 0.26.0...

Will be in 0.27?..

Oloremo avatar Jun 11 '21 11:06 Oloremo

Hey @Oloremo,

Sorry that this didn't make it into 0.26.0, I'm working on getting caught up and only wanted to focus on existing PRs and fixing bugs at this point. After this release I need to make a pass of the other projects I maintain (envconsul, consul-esm) and then I'll be putting together some roadmaps for the projects and will determine the priority of this at that time.

If you (and readers) want this, please consider adding a :+1: the top/original issue as I'll look at those when working on the priorities for the roadmap. The roadmaps will also be public and feedback will be welcome there as well.

Thanks.

eikenb avatar Jun 11 '21 18:06 eikenb

Hello @eikenb !

I am currently looking into implementing metrics in Consul-Template using armon/go-metrics, as requested in the issue, however I am a bit stuck regarding the implementation of histogram metrics: The armon/go-metrics library doesn't implement yet histograms and the PR implementing them is open since 2019. Seeing @findkim did implement histogram metrics in the past, I was planning to implement them too but I am unsure it is possible to easily implement metrics similar to histograms with the available metric types (gauge, counter, summary).

Do you have any opinion regarding the next steps for this issue? Whether it being to ping the maintainer of the aforementioned PR (which I'll do) or to switch to another metric library ?

Eclion avatar Aug 05 '22 14:08 Eclion

Imo, if the feature is missing it makes sense to implement everything but those metrics and update the metrics after the needed feature will be provided.

Something is better than nothing.

Oloremo avatar Aug 05 '22 15:08 Oloremo