metric-collector-for-apache-cassandra icon indicating copy to clipboard operation
metric-collector-for-apache-cassandra copied to clipboard

Metrics vanished after 10 mins

Open rjshrjndrn opened this issue 4 years ago • 3 comments

I've downloaded and used the default configuration for 7 node cassandra cluster. When ever I restart the cluster, for few mins, I am getting metrics, after that, its none. But if I directly scraping the metrics curl localhost:9103/metrics I'm getting metrics. But in that metrics, I don't find collectd_memory for instance. This is a sample output. I've restarted twice. image

image

I've downloaded the jar and deployed in 7 vms. I'm monitoring this from promtheus-operator in kubernetes cluster.

---
kind: Service
apiVersion: v1
metadata:
  name: cassandra-test
  namespace: loadtest
  labels:
    app: cassandra-test
spec:
  type: ClusterIP
  ports:
  - port: 9103
    targetPort: 9103
    name: metrics
---
kind: Endpoints
apiVersion: v1
metadata:
  name: cassandra-test
  namespace: loadtest
subsets:
- addresses:
  - ip: 28.0.2.101
  - ip: 28.0.2.102
  - ip: 28.0.2.103
  - ip: 28.0.2.104
  - ip: 28.0.2.105
  - ip: 28.0.2.106
  - ip: 28.0.2.107
  ports:
  - port: 9103
    name: metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    release: prometheus-operator
    app: prometheus-operator
  name: cassandra-test-monitoring
  namespace: loadtest
spec:
  endpoints:
  - honorLabels: true
    interval: 40s
    scrapeTimeout: 15s
    port: metrics
    relabelings:
    - action: labeldrop
      regex: ^pod$
    metricRelabelings:
    #drop metrics we can calculate from prometheus directly
    - sourceLabels: [__name__]
      regex: .*rate_(mean|1m|5m|15m)
      action: drop
    #save the original name for all metrics
    - sourceLabels: [__name__]
      regex: (collectd_mcac_.+)
      targetLabel: prom_name
      replacement: ${1}
    - sourceLabels: ["prom_name"]
      regex: .+_bucket_(\d+)
      targetLabel: le
      replacement: ${1}
    - sourceLabels: ["prom_name"]
      regex: .+_bucket_inf
      targetLabel: le
      replacement: +Inf
    - sourceLabels: ["prom_name"]
      regex: .*_histogram_p(\d+)
      targetLabel: quantile
      replacement: .${1}
    - sourceLabels: ["prom_name"]
      regex: .*_histogram_min
      targetLabel: quantile
      replacement: "0"
    - sourceLabels: ["prom_name"]
      regex: .*_histogram_max
      targetLabel: quantile
      replacement: "1"
    #Table Metrics *ALL* we can drop
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.table\.(\w+)
      action: drop
    #Table Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
      targetLabel: table
      replacement: ${3}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
      targetLabel: keyspace
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_table_${1}
    #Keyspace Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
      targetLabel: keyspace
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_keyspace_${1}
    #ThreadPool Metrics (one type is repair.task so we just ignore the second part)
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
      targetLabel: pool_type
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
      targetLabel: pool_name
      replacement: ${3}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
      targetLabel: __name__
      replacement: mcac_thread_pools_${1}
    #ClientRequest Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
      targetLabel: request_type
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
      targetLabel: __name__
      replacement: mcac_client_request_${1}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
      targetLabel: cl
      replacement: ${3}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
      targetLabel: request_type
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
      targetLabel: __name__
      replacement: mcac_client_request_${1}_cl
    #Cache Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
      targetLabel: cache_name
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_cache_${1}
    #CQL Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.cql\.(\w+)
      targetLabel: __name__
      replacement: mcac_cql_${1}
    #Dropped Message Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
      targetLabel: message_type
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_dropped_message_${1}
    #Streaming Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
      targetLabel: peer_ip
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
      targetLabel: __name__
      replacement: mcac_streaming_${1}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)$
      targetLabel: __name__
      replacement: mcac_streaming_${1}
    #CommitLog Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.commit_log\.(\w+)
      targetLabel: __name__
      replacement: mcac_commit_log_${1}
    #Compaction Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.compaction\.(\w+)
      targetLabel: __name__
      replacement: mcac_compaction_${1}
    #Storage Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.storage\.(\w+)
      targetLabel: __name__
      replacement: mcac_storage_${1}
    #Batch Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.batch\.(\w+)
      targetLabel: __name__
      replacement: mcac_batch_${1}
    #Client Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.client\.(\w+)
      targetLabel: __name__
      replacement: mcac_client_${1}
    #BufferPool Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.buffer_pool\.(\w+)
      targetLabel: __name__
      replacement: mcac_buffer_pool_${1}
    #Index Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.index\.(\w+)
      targetLabel: __name__
      replacement: mcac_sstable_index_${1}
    #HintService Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
      targetLabel: peer_ip
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
      targetLabel: __name__
      replacement: mcac_hints_${1}
    #HintService Metrics
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
      targetLabel: peer_ip
      replacement: ${1}
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
      targetLabel: __name__
      replacement: mcac_hints_hints_delays
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+)
      targetLabel: __name__
      replacement: mcac_hints_${1}
    # Misc
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.memtable_pool\.(\w+)
      targetLabel: __name__
      replacement: mcac_memtable_pool_${1}
    - sourceLabels: ["mcac"]
      regex: com\.datastax\.bdp\.type\.performance_objects\.name\.cql_slow_log\.metrics\.queries_latency
      targetLabel: __name__
      replacement: mcac_cql_slow_log_query_latency
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
      targetLabel: read_type
      replacement: $1
    - sourceLabels: ["mcac"]
      regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
      targetLabel: __name__
      replacement: mcac_read_coordination_requests
    #GC Metrics
    - sourceLabels: ["mcac"]
      regex: jvm\.gc\.(\w+)\.(\w+)
      targetLabel: collector_type
      replacement: ${1}
    - sourceLabels: ["mcac"]
      regex: jvm\.gc\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_jvm_gc_${2}
    #JVM Metrics
    - sourceLabels: ["mcac"]
      regex: jvm\.memory\.(\w+)\.(\w+)
      targetLabel: memory_type
      replacement: ${1}
    - sourceLabels: ["mcac"]
      regex: jvm\.memory\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_jvm_memory_${2}
    - sourceLabels: ["mcac"]
      regex: jvm\.memory\.pools\.(\w+)\.(\w+)
      targetLabel: pool_name
      replacement: ${2}
    - sourceLabels: ["mcac"]
      regex: jvm\.memory\.pools\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_jvm_memory_pool_${2}
    - sourceLabels: ["mcac"]
      regex: jvm\.fd\.usage
      targetLabel: __name__
      replacement: mcac_jvm_fd_usage
    - sourceLabels: ["mcac"]
      regex: jvm\.buffers\.(\w+)\.(\w+)
      targetLabel: buffer_type
      replacement: ${1}
    - sourceLabels: ["mcac"]
      regex: jvm\.buffers\.(\w+)\.(\w+)
      targetLabel: __name__
      replacement: mcac_jvm_buffer_${2}
    #Append the prom types back to formatted names
    - sourceLabels: [__name__, "prom_name"]
      regex: (mcac_.*);.*(_micros_bucket|_bucket|_micros_count_total|_count_total|_total|_micros_sum|_sum|_stddev).*
      separator: ;
      targetLabel: __name__
      replacement: ${1}${2}
    - regex: prom_name
      action: labeldrop
  namespaceSelector:
    matchNames:
    - loadtest
  selector:
    matchLabels:
      app: cassandra-test

rjshrjndrn avatar Feb 08 '21 17:02 rjshrjndrn

@rjshrjndrn We are also facing similar issue. Were you able to identify the issue or any workaround?

maruthimanoj avatar Jul 08 '21 14:07 maruthimanoj

This isn't a direct answer to your question, but it might be worth taking a look at K8ssandra and seeing how we setup the service monitor that we're using:

https://github.com/k8ssandra/k8ssandra/blob/main/charts/k8ssandra/templates/prometheus/service_monitor.yaml

jdonenine avatar Jul 08 '21 15:07 jdonenine

I've run into the same problem too, so is this a bug or what?

pangzhenzhou avatar May 30 '23 09:05 pangzhenzhou