pharos-cluster Defrag etcd automatically

What would you like to be added:

Defrag etcd automatically: https://coreos.com/etcd/docs/latest/op-guide/maintenance.html#defragmentation

Why is this needed:

Kubernetes already automatically does compaction but not defrag which may block etcd for longer period.

Dec 05 '18 15:12 jakolehm

The snapshot count also seems exceptionally high at 100,000 -- the default (suggested) value looks to be 10,000: https://coreos.com/etcd/docs/latest/tuning.html#snapshots

This causes more-than-necessary disk utilization.

Dec 10 '18 15:12 Timer

The snapshot count also seems exceptionally high at 100,000

Pharos should use the etcd default, we don't set that option at all: https://github.com/kontena/pharos-cluster/blob/master/lib/pharos/scripts/configure-etcd.sh

@Timer How did you get to 100,000 for snapshot count?

Reading some of the comments in K8S issues/PRs and I'm not at all convinced that we should run defrag periodically:

defrag is not super efficient and is not designed to run frequently. https://github.com/kubernetes/kubernetes/pull/45090#issuecomment-298067076

Just a reminder: defrag is a stop the world operation. If you have a db that is actually 1GB or so, defrag can freeze everything for 10+ second especially when you only run 1 etcd node. https://github.com/kubernetes/kubernetes/pull/45090#issuecomment-306872463

Jan 15 '19 06:01 jnummelin

@Timer How did you get to 100,000 for snapshot count? @jnummelin

I saw this in the logs when the etcd server was booting.

Reading some of the comments in K8S issues/PRs and I'm not at all convinced that we should run defrag periodically

"Stop the world" only applies to the control components, so all workloads should continue operating nominally.

Jan 15 '19 17:01 Timer

pharos-cluster pharos-cluster copied to clipboard

Defrag etcd automatically

pharos-cluster
pharos-cluster copied to clipboard