tobs icon indicating copy to clipboard operation
tobs copied to clipboard

Error on installing Tobs on GKE 1.24

Open umgbhalla opened this issue 1 year ago • 7 comments

What did you do? helm install otel timescale/tobs -n tobs-otel --create-namespace --wait --timeout 25m

Did you expect to see some different?

Environment

  • tobs version:

    14.6.0

  • Kubernetes version information:

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3-gke.2100", GitCommit:"25d7334511e90d0b636707059c955baebce769cd", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:54Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind: gke

Anything else we need to know?:

image image

umgbhalla avatar Sep 14 '22 10:09 umgbhalla

Hey, with some digging around, it looks like a bug in kubernetes (kubernetes/kubernetes#67761). There is also a issue on the helm side for this (helm/helm#9710), and an open PR fixing this as well, but it is not merged (helm/helm#9713).

Can you pleas add your helm CLI version to the issue as well? That'll help me reproduce this faster.

onprem avatar Sep 14 '22 11:09 onprem

Hi @onprem thanks for reply helm version

version.BuildInfo{Version:"v3.9.1", GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4", GitTreeState:"clean", GoVersion:"go1.17.5"}

same happend on helm version

version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.17.13"}

umgbhalla avatar Sep 14 '22 11:09 umgbhalla

Going through the upstream issues, it looks like using helm with resource quotas enabled and a big helm chart is hit or miss. The problem occurs when helm tries to create a lot of resources (in tobs' case, it can create a lot of stuff as it bundles OTel and Kube Prometheus along with other projects) in a short amount of time. Every pod or service creation triggers an update in the remaining quota part of the ResourceQuota object and can lead to conflicts.

Currently helm does not have the retry patch merged and looks like the PR is abandoned as well due to lack of reviews over a long time.

The workaround I'd suggest is to incrementally roll out tobs.

Start with most of the components disabled. For example everything disabled apart from TimescaleDB, Promscale, and kube-prometheus (you can even disable some kube-prometheus parts as well, for example Grafana). Then update your helm release with more components enabled (like open telemetry). Even with this you might encounter the same error, but retrying the operation until it succeeds is the only workaround for now.

onprem avatar Sep 14 '22 12:09 onprem

ohk got it , using older version for helm would work ?

umgbhalla avatar Sep 14 '22 12:09 umgbhalla

I don't think using an older version of helm would work. But if you are willing to, removing the resource quotas will do the trick here.

onprem avatar Sep 14 '22 12:09 onprem

yeah i tried removing the resource quotas but they get added back to the namespace as soon as removed kubectl delete resourcequota gke-resource-quotas -n tobs-otel

umgbhalla avatar Sep 14 '22 12:09 umgbhalla

Ah, looks like they are immutable and cannot be removed: https://cloud.google.com/kubernetes-engine/quotas#resource_quotas.

onprem avatar Sep 14 '22 12:09 onprem

This issue went stale because it was not updated in a month. Please consider updating it to improve the quality of the project.

github-actions[bot] avatar Oct 24 '22 03:10 github-actions[bot]