management-api-for-apache-cassandra icon indicating copy to clipboard operation
management-api-for-apache-cassandra copied to clipboard

Latency metrics are missing in Cassandra 4.1

Open c3-clement opened this issue 1 year ago • 6 comments

Hello,

Our latency Grafana dashboards are not showing any data with management api 4.1.5-v0.1.79, while they are working fine on 4.0 .

The following prometheus metrics are missing from prometheus:

  • mcac_client_request_latency_bucket
  • mcac_table_range_latency_bucket
  • mcac_table_read_latency_bucket
  • mcac_table_write_latency_bucket
  • mcac_table_coordinator_read_latency_bucket
  • mcac_table_coordinator_scan_latency_bucket

In the system logs I'm seeing this error message that could be related:

INFO  [insights-8-1] 2024-06-12 14:24:03,177 NoSpamLogger.java:105 - Not able to get buckets for org.apache.cassandra.metrics.dropped_message.internal_dropped_latency.finalize_propose_msg 128 type org.apache.cassandra.metrics.DecayingEstimatedHistogramReservoir$EstimatedHistogramReservoirSnapshot

I have tried to request the MCAC metrics endpoint on port 9103. In 4.1.5 there is not single entry starting with collectd_mcac_micros_bucket , while I'm seeing it in 4.0.X

I'm using this telemetry configuration on k8ssandracluster :

      telemetry:
        mcac:
          enabled: true
          metricFilters:
            - allow:org.apache.cassandra.metrics.Table
            - allow:org.apache.cassandra.metrics.table
            - allow:org.apache.cassandra.metrics.client_request
        prometheus:
          enabled: true

┆Issue is synchronized with this Jira Story by Unito

c3-clement avatar Jun 12 '24 14:06 c3-clement

@adejanovski @burmanm I've seen this closed issue https://github.com/k8ssandra/management-api-for-apache-cassandra/issues/444 .

However, it seems that the issue is still happening

c3-clement avatar Jun 12 '24 14:06 c3-clement

The #444 should have fixed the missing metrics and in our testing it did, assuming you use the newer metrics endpoints. The names of the metrics are a bit different, to align with the naming inside Cassandra. Only the older endpoint returns mcac* metrics and that endpoint is deprecated and no changes will be done to it.

burmanm avatar Jun 19 '24 07:06 burmanm

The #444 should have fixed the missing metrics and in our testing it did, assuming you use the newer metrics endpoints. The names of the metrics are a bit different, to align with the naming inside Cassandra. Only the older endpoint returns mcac* metrics and that endpoint is deprecated and no changes will be done to it.

Thanks for the feedback @burmanm .

assuming you use the newer metrics endpoints. The names of the metrics are a bit different

Is there any documentation about those new metrics endpoints and those new metrics names?

We are using k8ssandra-operator and it's creating a prometheus ServiceMonitor to scrape Cassandra metrics, so I assume it should hit the correct endpoint automatically when Cassandra 4.1 is deployed.

However if metrics names changed we probably have to update our Grafana dashboards

c3-clement avatar Jun 19 '24 09:06 c3-clement

That's the old "MCAC" port. The new /metrics endpoint listens in port 9000. The k8ssandra-operator will create ServiceMonitors for the new endpoints if MCAC is no longer enabled:

    telemetry:
      mcac:
        enabled: false

But yes, you would need new dashboards to support the new naming. See here for our example ones for installation of the new ones: https://docs.k8ssandra.io/tasks/monitor/prometheus-grafana/#install-the-grafana-dashboards

burmanm avatar Jun 19 '24 10:06 burmanm

If you don't wish to disable MCAC yet, you can also simply create new ServiceMonitor for the new endpoint. Endpoints would look like this in the ServiceMonitor spec:

spec:
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
    scheme: http
    scrapeTimeout: 15s

Rest can be copied from the old one.

burmanm avatar Jun 19 '24 10:06 burmanm

Thanks a lot @burmanm ! We will try this shortly

c3-clement avatar Jun 19 '24 14:06 c3-clement

@c3-clement, do you still have an issue here or can we close this ticket?

adejanovski avatar Mar 21 '25 08:03 adejanovski