etcd-backup-restore
etcd-backup-restore copied to clipboard
[Enhancement] Add capability for operators to monitor etcd data
Enhancement (What you would like to be added): There is a need to get insights into data that it stores in the DB (bbolt-DB). This provides valuable information on which resource type has the most keys and size. @istvanballok recently executed the following command to get that data out of etcd:
apk add jq util-linux
etcdctl --insecure-skip-tls-verify --cert /var/etcd/ssl/client/server/tls.crt --key /var/etcd/ssl/client/server/tls.key --cacert /var/etcd/ssl/client/ca/bundle.crt get --prefix / -w json | jq '.kvs[] | {key: .key | @base64d, valueLength: .value | length} | "\(.key | sub("/[^/]+/((?<type>[^/.]+)/.*|[^/]+/(?<customtype>[^/]+)/.*)";"\(.type // .customtype)")) \(.valueLength)"' -r | awk '{sum[$1]+=$2; count[$1]++} END{for (key in sum) {printf "%s %s %s\n", sum[key], count[key], key}}' | sort -rn | column -t
Example output:
34156612 291 shootstates
17002464 7271 meteringreports
9932816 2592 secrets
5786756 476 shoots
3438780 38 cloudprofiles
It would be beneficial for the operators/devs to get easy access to this data either on demand or as custom metrics that are exposed to prometheus.
NOTE: The above is just one set of information. We should identify additional information/custom-metrics that is not available out-of-the-box from etcd over time.
Motivation (Why is this needed?): Use cases:
- Operators can inspect the etcd data to know why etcd DB is close to the 8GB mark and perhaps take corrective actions.
- Developers can inspect this data over a period of time and fine tune the resource that get stored in etcd.
Approach/Hint to the implement solution (optional):