etcd-operator icon indicating copy to clipboard operation
etcd-operator copied to clipboard

Optimize quotas requests/limits for CPU and memory by default

Open themoriarti opened this issue 1 year ago • 7 comments

When cozystack tried to integrate the etcd-operator (such a problem will haunt all those who will start many etсd clusters in default), we faced the fact that it is necessary to set the default requests/limits parameters for etсd when the cluster is started by default, based on the fact that etcd is not the only one service large stack, reserving a large number of CPUs is quite critical these days. And we also need to set a higher memory limit for etcd itself. Thus, we will have to add our default to etcd-operator default, which will be customized, to avoid this double configuration, I suggest reviewing the optimal CPU reservation and the memory limits.

For CPU requests https://github.com/aenix-io/etcd-operator/blob/eaca8d6006a18a05ffe8dfe7009739261d21b0bd/charts/etcd-operator/values.yaml#L32 maybe 20-40m per pod

For Memory limit https://github.com/aenix-io/etcd-operator/blob/eaca8d6006a18a05ffe8dfe7009739261d21b0bd/charts/etcd-operator/values.yaml#L30 maybe 384-512MB

More details in cozystack PR https://github.com/aenix-io/cozystack/pull/95

themoriarti avatar Apr 26 '24 11:04 themoriarti

Kubeadm defaults are:

    resources:
      requests:
        cpu: 100m
        memory: 100Mi

I would prefer to stick with these values, considering that our etcd can be used by several Kubernetes clusters in a multi-tenant setup.

cc @knight42 who introduced them in upstream https://github.com/kubernetes/kubernetes/commit/2ebd2937806ae957df67ae8138f62a5ad9b39c8d

kvaps avatar Apr 26 '24 15:04 kvaps

Kubeadm defaults are:

    resources:
      requests:
        cpu: 100m
        memory: 100Mi

I would prefer to stick with these values, considering that our etcd can be used by several Kubernetes clusters in a multi-tenant setup.

cc @knight42 who introduced them in upstream kubernetes/kubernetes@2ebd293

I didn't quite understand what you mean, in order for etсd work, there is no need to reserve more than necessary, and I think that the upper shelf of limit can be expanded, because it can still eat a processor during commits. But in general, memory and memory limit are important for etcd, but by no means reserve resources that may not be used.

themoriarti avatar Apr 26 '24 15:04 themoriarti

Default values are typically provided to ensure that the majority of users can operate without the additional cognitive burden of modifying settings.

In our case, the primary users of our etcd-operator are those who run Kubernetes control-planes inside the Kubernetes, such projects as the Kamaji and Cozystack. Based on the defaults provided by upstream Kubernetes project, and considering the fact that we support a multi-tenant installations, I would suggest keeping the defaults as they are now or reducing them to upstream Kubernetes requirements.

Those who wish to fine-tune the settings can still replace these values with others as needed.

kvaps avatar Apr 26 '24 15:04 kvaps

Why do I need an operator that by default starts etcd with parameters that do not correspond to reality when used? If you can still put up with CPU reservation 100m per pod, although, this will lead to a large reservation of resources that can simply be idle. But the etcd memory limit will cause OOM, 128mb of memory is extremely small, and if you consider that we immediately allocate 4GB of disk for etcd, which should eat 128mb of RAM, it looks very strange. The memory limit for 4GB of storage should be over 12GB RAM.

ETCD Hardware recommendations

My request is based on real working clusters and the data that I see according to monitoring during real use. Are yours based on the fact that someone set it up like that and we have to do like everyone else? They could have set these limits a long time ago and not redefine them.

themoriarti avatar Apr 26 '24 15:04 themoriarti

@themoriarti It's the resources for an operator itself, which could be recognized from the very first line of file and from here, not for the node of ETCD cluster.

I suppose what you are looking for is here, the link for example.

hiddenmarten avatar Apr 27 '24 08:04 hiddenmarten

Also, as it could be discovered here and here we do not apply any default resources for nodes in ETCD cluster itself.

Am I right in saying that you want us to set default resources for an ETCD node?

hiddenmarten avatar Apr 27 '24 08:04 hiddenmarten

@hiddenmarten Thanks for clarifying, to be honest I didn't delve deep into code itself, so your comments are valuable to me.

I just want that when etcd-operator starts pods with the etcd service directly, there will be some understandable default config for reserving resources and limits, if there is none, it will be installed Qos: Besteffor.

themoriarti avatar Apr 27 '24 18:04 themoriarti

I've found official recommendations from the ETCD documentation, for me it looks a bit fuzzy. I wouldn't implement it taking into account various of ETCD usages in Kubernetes.

I prefer to leave default values as is. Please, take into account that you always can override resources in the containerSpec section.

hiddenmarten avatar Jun 02 '24 10:06 hiddenmarten