etcd-backup-restore icon indicating copy to clipboard operation
etcd-backup-restore copied to clipboard

[Enhancement] Add capability for operators to monitor etcd data

Open unmarshall opened this issue 1 year ago • 2 comments

Enhancement (What you would like to be added): There is a need to get insights into data that it stores in the DB (bbolt-DB). This provides valuable information on which resource type has the most keys and size. @istvanballok recently executed the following command to get that data out of etcd:

apk add jq util-linux
etcdctl --insecure-skip-tls-verify --cert /var/etcd/ssl/client/server/tls.crt --key /var/etcd/ssl/client/server/tls.key --cacert /var/etcd/ssl/client/ca/bundle.crt get --prefix / -w json | jq '.kvs[] | {key: .key | @base64d, valueLength: .value | length} | "\(.key | sub("/[^/]+/((?<type>[^/.]+)/.*|[^/]+/(?<customtype>[^/]+)/.*)";"\(.type  // .customtype)")) \(.valueLength)"' -r | awk '{sum[$1]+=$2; count[$1]++} END{for (key in sum) {printf "%s %s %s\n", sum[key], count[key], key}}' | sort -rn | column -t

Example output:

34156612  291   shootstates
17002464  7271  meteringreports
9932816   2592  secrets
5786756   476   shoots
3438780   38    cloudprofiles

It would be beneficial for the operators/devs to get easy access to this data either on demand or as custom metrics that are exposed to prometheus.

NOTE: The above is just one set of information. We should identify additional information/custom-metrics that is not available out-of-the-box from etcd over time.

Motivation (Why is this needed?): Use cases:

  • Operators can inspect the etcd data to know why etcd DB is close to the 8GB mark and perhaps take corrective actions.
  • Developers can inspect this data over a period of time and fine tune the resource that get stored in etcd.

Approach/Hint to the implement solution (optional):

unmarshall avatar Mar 02 '23 06:03 unmarshall