metric-collector-for-apache-cassandra
metric-collector-for-apache-cassandra copied to clipboard
Metrics vanished after 10 mins
I've downloaded and used the default configuration for 7 node cassandra cluster. When ever I restart the cluster, for few mins, I am getting metrics, after that, its none. But if I directly scraping the metrics curl localhost:9103/metrics
I'm getting metrics. But in that metrics, I don't find collectd_memory
for instance.
This is a sample output. I've restarted twice.
I've downloaded the jar and deployed in 7 vms. I'm monitoring this from promtheus-operator in kubernetes cluster.
---
kind: Service
apiVersion: v1
metadata:
name: cassandra-test
namespace: loadtest
labels:
app: cassandra-test
spec:
type: ClusterIP
ports:
- port: 9103
targetPort: 9103
name: metrics
---
kind: Endpoints
apiVersion: v1
metadata:
name: cassandra-test
namespace: loadtest
subsets:
- addresses:
- ip: 28.0.2.101
- ip: 28.0.2.102
- ip: 28.0.2.103
- ip: 28.0.2.104
- ip: 28.0.2.105
- ip: 28.0.2.106
- ip: 28.0.2.107
ports:
- port: 9103
name: metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
release: prometheus-operator
app: prometheus-operator
name: cassandra-test-monitoring
namespace: loadtest
spec:
endpoints:
- honorLabels: true
interval: 40s
scrapeTimeout: 15s
port: metrics
relabelings:
- action: labeldrop
regex: ^pod$
metricRelabelings:
#drop metrics we can calculate from prometheus directly
- sourceLabels: [__name__]
regex: .*rate_(mean|1m|5m|15m)
action: drop
#save the original name for all metrics
- sourceLabels: [__name__]
regex: (collectd_mcac_.+)
targetLabel: prom_name
replacement: ${1}
- sourceLabels: ["prom_name"]
regex: .+_bucket_(\d+)
targetLabel: le
replacement: ${1}
- sourceLabels: ["prom_name"]
regex: .+_bucket_inf
targetLabel: le
replacement: +Inf
- sourceLabels: ["prom_name"]
regex: .*_histogram_p(\d+)
targetLabel: quantile
replacement: .${1}
- sourceLabels: ["prom_name"]
regex: .*_histogram_min
targetLabel: quantile
replacement: "0"
- sourceLabels: ["prom_name"]
regex: .*_histogram_max
targetLabel: quantile
replacement: "1"
#Table Metrics *ALL* we can drop
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.table\.(\w+)
action: drop
#Table Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
targetLabel: table
replacement: ${3}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
targetLabel: keyspace
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.table\.(\w+)\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_table_${1}
#Keyspace Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
targetLabel: keyspace
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.keyspace\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_keyspace_${1}
#ThreadPool Metrics (one type is repair.task so we just ignore the second part)
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
targetLabel: pool_type
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
targetLabel: pool_name
replacement: ${3}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.thread_pools\.(\w+)\.(\w+)\.(\w+).*
targetLabel: __name__
replacement: mcac_thread_pools_${1}
#ClientRequest Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
targetLabel: request_type
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)$
targetLabel: __name__
replacement: mcac_client_request_${1}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
targetLabel: cl
replacement: ${3}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
targetLabel: request_type
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client_request\.(\w+)\.(\w+)\.(\w+)$
targetLabel: __name__
replacement: mcac_client_request_${1}_cl
#Cache Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
targetLabel: cache_name
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.cache\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_cache_${1}
#CQL Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.cql\.(\w+)
targetLabel: __name__
replacement: mcac_cql_${1}
#Dropped Message Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
targetLabel: message_type
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.dropped_message\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_dropped_message_${1}
#Streaming Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
targetLabel: peer_ip
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)\.(.+)$
targetLabel: __name__
replacement: mcac_streaming_${1}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.streaming\.(\w+)$
targetLabel: __name__
replacement: mcac_streaming_${1}
#CommitLog Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.commit_log\.(\w+)
targetLabel: __name__
replacement: mcac_commit_log_${1}
#Compaction Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.compaction\.(\w+)
targetLabel: __name__
replacement: mcac_compaction_${1}
#Storage Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.storage\.(\w+)
targetLabel: __name__
replacement: mcac_storage_${1}
#Batch Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.batch\.(\w+)
targetLabel: __name__
replacement: mcac_batch_${1}
#Client Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.client\.(\w+)
targetLabel: __name__
replacement: mcac_client_${1}
#BufferPool Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.buffer_pool\.(\w+)
targetLabel: __name__
replacement: mcac_buffer_pool_${1}
#Index Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.index\.(\w+)
targetLabel: __name__
replacement: mcac_sstable_index_${1}
#HintService Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
targetLabel: peer_ip
replacement: ${2}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.hinted_hand_off_manager\.([^\-]+)-(\w+)
targetLabel: __name__
replacement: mcac_hints_${1}
#HintService Metrics
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
targetLabel: peer_ip
replacement: ${1}
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.hints_service\.hints_delays\-(\w+)
targetLabel: __name__
replacement: mcac_hints_hints_delays
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.hints_service\.([^\-]+)
targetLabel: __name__
replacement: mcac_hints_${1}
# Misc
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.memtable_pool\.(\w+)
targetLabel: __name__
replacement: mcac_memtable_pool_${1}
- sourceLabels: ["mcac"]
regex: com\.datastax\.bdp\.type\.performance_objects\.name\.cql_slow_log\.metrics\.queries_latency
targetLabel: __name__
replacement: mcac_cql_slow_log_query_latency
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
targetLabel: read_type
replacement: $1
- sourceLabels: ["mcac"]
regex: org\.apache\.cassandra\.metrics\.read_coordination\.(.*)
targetLabel: __name__
replacement: mcac_read_coordination_requests
#GC Metrics
- sourceLabels: ["mcac"]
regex: jvm\.gc\.(\w+)\.(\w+)
targetLabel: collector_type
replacement: ${1}
- sourceLabels: ["mcac"]
regex: jvm\.gc\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_jvm_gc_${2}
#JVM Metrics
- sourceLabels: ["mcac"]
regex: jvm\.memory\.(\w+)\.(\w+)
targetLabel: memory_type
replacement: ${1}
- sourceLabels: ["mcac"]
regex: jvm\.memory\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_jvm_memory_${2}
- sourceLabels: ["mcac"]
regex: jvm\.memory\.pools\.(\w+)\.(\w+)
targetLabel: pool_name
replacement: ${2}
- sourceLabels: ["mcac"]
regex: jvm\.memory\.pools\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_jvm_memory_pool_${2}
- sourceLabels: ["mcac"]
regex: jvm\.fd\.usage
targetLabel: __name__
replacement: mcac_jvm_fd_usage
- sourceLabels: ["mcac"]
regex: jvm\.buffers\.(\w+)\.(\w+)
targetLabel: buffer_type
replacement: ${1}
- sourceLabels: ["mcac"]
regex: jvm\.buffers\.(\w+)\.(\w+)
targetLabel: __name__
replacement: mcac_jvm_buffer_${2}
#Append the prom types back to formatted names
- sourceLabels: [__name__, "prom_name"]
regex: (mcac_.*);.*(_micros_bucket|_bucket|_micros_count_total|_count_total|_total|_micros_sum|_sum|_stddev).*
separator: ;
targetLabel: __name__
replacement: ${1}${2}
- regex: prom_name
action: labeldrop
namespaceSelector:
matchNames:
- loadtest
selector:
matchLabels:
app: cassandra-test
@rjshrjndrn We are also facing similar issue. Were you able to identify the issue or any workaround?
This isn't a direct answer to your question, but it might be worth taking a look at K8ssandra and seeing how we setup the service monitor that we're using:
https://github.com/k8ssandra/k8ssandra/blob/main/charts/k8ssandra/templates/prometheus/service_monitor.yaml
I've run into the same problem too, so is this a bug or what?