loki icon indicating copy to clipboard operation
loki copied to clipboard

Make sure all ring/kv store/memberlist metrics are prefixed with loki_

Open cstyan opened this issue 3 years ago • 6 comments

This PR contains a subset of the changes from https://github.com/grafana/loki/pull/6439.

Here I am ensuring that metrics related to the ring (which includes those created by the lifecycler), KV store clients, and general memberlist metrics, are prefixed with loki_ instead of cortex_ (or nothing).

None of these metrics are used in the production/ directory for dashboards or alerts/recording rules

EDIT: Note that this is essentially the minimum changeset to get all memberlist migration related metrics to be prefixed properly with loki since we create an instance of the metrics struct separately per component/service (in the ingester code, distributor code, compactor code, etc.)

Signed-off-by: Callum Styan [email protected]

cstyan avatar Jun 22 '22 02:06 cstyan

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
-               loki	-0.6%

grafanabot avatar Jun 22 '22 03:06 grafanabot

I know this will be a bit tedious but can we generate a list of all the metric names that will be changed?

We should include this list in the upgrade guide so people know exactly what metrics need to change.

Changing metrics is always a bit rough as it does break dashboards, however, these are pretty obscure metrics and as long as we are clear in the upgrade guide I think I'm ok with it.

slim-bean avatar Jun 22 '22 18:06 slim-bean

These are the metrics that would have name changes:

cortex_consul_request_duration_seconds_bucket -> loki_consul_request_duration_seconds_bucket
cortex_consul_request_duration_seconds_count -> loki_consul_request_duration_seconds_count
cortex_consul_request_duration_seconds_sum -> loki_consul_request_duration_seconds_sum
cortex_dns_failures_total -> loki_dns_failures_total
cortex_dns_lookups_total -> loki_dns_lookups_total
cortex_kv_request_duration_seconds_bucket -> loki_kv_request_duration_seconds_bucket
cortex_kv_request_duration_seconds_count -> loki_kv_request_duration_seconds_count
cortex_kv_request_duration_seconds_sum -> loki_kv_request_duration_seconds_sum
cortex_member_consul_heartbeats_total -> loki_member_consul_heartbeats_total
cortex_ring_members -> loki_ring_members
cortex_ring_oldest_member_timestamp -> loki_ring_oldest_member_timestamp
cortex_ring_tokens_total -> loki_ring_tokens_total
ring_member_heartbeats_total -> loki_ring_member_heartbeats_total
ring_member_tokens_owned -> loki_ring_member_tokens_owned
ring_member_tokens_to_own -> loki_ring_member_tokens_to_own
cortex_ruler_ring_check_errors_total -> loki_ruler_ring_check_errors_total

I'll update the changelog and upgrade guide now.

EDIT: since we're about to start the 2.6 release this change likely won't make it into that release, so I'll hold off on making additions to an upgrade guide, unless @KMiller-Grafana wants me to start a 2.7 upgrade guide but just not list it on the web page.

cstyan avatar Jun 22 '22 22:06 cstyan

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

grafanabot avatar Jun 22 '22 22:06 grafanabot

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

grafanabot avatar Jul 28 '22 21:07 grafanabot

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Sep 21 '22 06:09 stale[bot]

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

stale[bot] avatar Nov 02 '22 07:11 stale[bot]