pharos-cluster icon indicating copy to clipboard operation
pharos-cluster copied to clipboard

Defrag etcd automatically

Open jakolehm opened this issue 6 years ago • 3 comments

What would you like to be added:

Defrag etcd automatically: https://coreos.com/etcd/docs/latest/op-guide/maintenance.html#defragmentation

Why is this needed:

Kubernetes already automatically does compaction but not defrag which may block etcd for longer period.

jakolehm avatar Dec 05 '18 15:12 jakolehm

The snapshot count also seems exceptionally high at 100,000 -- the default (suggested) value looks to be 10,000: https://coreos.com/etcd/docs/latest/tuning.html#snapshots

This causes more-than-necessary disk utilization.

Timer avatar Dec 10 '18 15:12 Timer

The snapshot count also seems exceptionally high at 100,000

Pharos should use the etcd default, we don't set that option at all: https://github.com/kontena/pharos-cluster/blob/master/lib/pharos/scripts/configure-etcd.sh

@Timer How did you get to 100,000 for snapshot count?

Reading some of the comments in K8S issues/PRs and I'm not at all convinced that we should run defrag periodically:

defrag is not super efficient and is not designed to run frequently. https://github.com/kubernetes/kubernetes/pull/45090#issuecomment-298067076

Just a reminder: defrag is a stop the world operation. If you have a db that is actually 1GB or so, defrag can freeze everything for 10+ second especially when you only run 1 etcd node. https://github.com/kubernetes/kubernetes/pull/45090#issuecomment-306872463

jnummelin avatar Jan 15 '19 06:01 jnummelin

@Timer How did you get to 100,000 for snapshot count? @jnummelin

I saw this in the logs when the etcd server was booting.

Reading some of the comments in K8S issues/PRs and I'm not at all convinced that we should run defrag periodically

"Stop the world" only applies to the control components, so all workloads should continue operating nominally.

Timer avatar Jan 15 '19 17:01 Timer