helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[kube-prometheus-stack] Prometheus not created if additionalArgs are set

Open fniko opened this issue 1 year ago • 6 comments

Describe the bug a clear and concise description of what the bug is.

Upon trying to set storage.tsdb.min-block-duration using additionalArgs while thanos objectStorageConfig configuration is present, the prometheus StatefulSet is not created.

~After clean install using Helm, I am observing two strange warnings - it might relate~ (fixed by removing old CRD)

W0219 00:16:32.123096   89043 warnings.go:70] unknown field "spec.scrapeConfigNamespaceSelector"
W0219 00:16:32.123672   89043 warnings.go:70] unknown field "spec.scrapeConfigSelector"

What's your helm version?

3.14.1

What's your kubectl version?

1.24.2

Which chart?

kube-prometheus-stack

What's the chart version?

56.7.0

What happened?

After using custom values in order to increase Thanos sync frequency to remote storage, the prometheus did not reflect those changes. When using as clean install, the prometheus did not show up at all. It seems that StatefulSet is not created. It seems like the issue is with objectStorageConfig under thanos configuration block. When it's removed (see values.yml below), prometheus starts to behave as expected.

Helm output

Release "kube-prometheus-stack" does not exist. Installing it now.
W0219 00:16:32.123096   89043 warnings.go:70] unknown field "spec.scrapeConfigNamespaceSelector"
W0219 00:16:32.123672   89043 warnings.go:70] unknown field "spec.scrapeConfigSelector"
NAME: kube-prometheus-stack
LAST DEPLOYED: Mon Feb 19 00:16:21 2024
NAMESPACE: kube-prometheus-stack
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace kube-prometheus-stack get pods -l "release=kube-prometheus-stack"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

What you expected to happen?

  • Prometheus is deployed
  • Arguments are reflected and the period will be increased

How to reproduce it?

  1. Create values.yml file with provided values
  2. Use helm deploy command as provided

Enter the changed values of values.yml?

prometheus:
  prometheusSpec:
    # Increase Thanos sync period - used to DEBUG
    disableCompaction: false
    additionalArgs:
      - name: storage.tsdb.max-block-duration
        value: "30s"

    # Configure Thanos
    thanos:
      objectStorageConfig:
        secret:
          type: S3
          config:
            bucket: "thanos"
            endpoint: "region.provider.com"
            access_key: "xxx"
            secret_key: "xxx"

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 56.7.0 \
  --values values.yml

Anything else we need to know?

This values.yml configuration works as expected - max-block-duration is set and sidecar is live

prometheus:
  prometheusSpec:
    # Increase Thanos sync period - used to DEBUG
    disableCompaction: false
    additionalArgs:
      - name: storage.tsdb.max-block-duration
        value: "30s"

    # Configure Thanos
    thanos:
      image: quay.io/thanos/thanos:v0.28.1

Full outpus

helm ls

NAME                 	NAMESPACE            	REVISION	UPDATED                             	STATUS  	CHART                       	APP VERSION
kube-prometheus-stack	kube-prometheus-stack	1       	2024-02-19 00:54:45.810489 +0000 UTC	deployed	kube-prometheus-stack-56.7.0	v0.71.2

kubectl get pod

alertmanager-kube-prometheus-stack-alertmanager-0           2/2     Running   0          5m24s
kube-prometheus-stack-grafana-585d96b575-dl4tp              3/3     Running   0          5m25s
kube-prometheus-stack-kube-state-metrics-5744bb9db6-62ng2   1/1     Running   0          5m25s
kube-prometheus-stack-operator-6f97fc84f6-fcpb6             1/1     Running   0          5m25s
kube-prometheus-stack-prometheus-node-exporter-2bdqn        1/1     Running   0          5m25s
...
kube-prometheus-stack-prometheus-node-exporter-tffbd        1/1     Running   0          5m25s

kubectl get deploy

NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
kube-prometheus-stack-grafana              1/1     1            1           6m28s
kube-prometheus-stack-kube-state-metrics   1/1     1            1           6m28s
kube-prometheus-stack-operator             1/1     1            1           6m28s

kubectl get statefulset

NAME                                              READY   AGE
alertmanager-kube-prometheus-stack-alertmanager   1/1     6m45s

fniko avatar Feb 19 '24 00:02 fniko

~I have discovered a typo within my values.yml file which caused this error. Closing, sorry.~

fniko avatar Feb 19 '24 00:02 fniko

I though that the issue was caused by some typo, however it seems there was a deeper relation between configuration blocks. I am reopening this issue with updated description.

fniko avatar Feb 19 '24 00:02 fniko

Also passing output from helm template, not including into original post to make it more clear.

# Source: kube-prometheus-stack/templates/prometheus/prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: kube-prometheus-stack-prometheus
  namespace: kube-prometheus-stack
  labels:
    app: kube-prometheus-stack-prometheus

    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: kube-prometheus-stack
    app.kubernetes.io/version: "56.7.0"
    app.kubernetes.io/part-of: kube-prometheus-stack
    chart: kube-prometheus-stack-56.7.0
    release: "kube-prometheus-stack"
    heritage: "Helm"
spec:
  alerting:
    alertmanagers:
      - namespace: kube-prometheus-stack
        name: kube-prometheus-stack-alertmanager
        port: http-web
        pathPrefix: "/"
        apiVersion: v2
  image: "quay.io/prometheus/prometheus:v2.49.1"
  version: v2.49.1
  additionalArgs:
    - name: storage.tsdb.max-block-duration
      value: 30s
  externalUrl: http://kube-prometheus-stack-prometheus.kube-prometheus-stack:9090
  paused: false
  replicas: 1
  shards: 1
  logLevel:  info
  logFormat:  logfmt
  listenLocal: false
  enableAdminAPI: false
  retention: "10d"
  tsdb:
    outOfOrderTimeWindow: 0s
  walCompression: true
  routePrefix: "/"
  serviceAccountName: kube-prometheus-stack-prometheus
  serviceMonitorSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  serviceMonitorNamespaceSelector: {}
  podMonitorSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  podMonitorNamespaceSelector: {}
  probeSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  probeNamespaceSelector: {}
  securityContext:
    fsGroup: 2000
    runAsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
    seccompProfile:
      type: RuntimeDefault
  ruleNamespaceSelector: {}
  ruleSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  scrapeConfigSelector:
    matchLabels:
      release: "kube-prometheus-stack"

  scrapeConfigNamespaceSelector: {}
  thanos:
    image: quay.io/thanos/thanos:v0.28.1
    objectStorageConfig:
      key: object-storage-configs.yaml
      name: kube-prometheus-stack-prometheus
  portName: http-web
  hostNetwork: false

When trying to just apply this (for debug purposes) kubectl apply above-config.yml

Error from server (BadRequest): error when creating "above-config.yml": Prometheus in version "v1" cannot be handled as a Prometheus: strict decoding error: unknown field "spec.scrapeConfigNamespaceSelector", unknown field "spec.scrapeConfigSelector"

EDIT: The above error was fixed by removing CRD - Uninstall Helm Chart . Current version from CRD kubectl describe crd prometheuses.monitoring.coreos.com

Annotations:  controller-gen.kubebuilder.io/version: v0.13.0
              operator.prometheus.io/version: 0.71.2

fniko avatar Feb 19 '24 01:02 fniko

Ok, I did more debug and after manually applying the above prometheus.yml, the output of kubectl describe prometheus kube-prometheus-stack-prometheus is:

making statefulset failed: make StatefulSet spec: can't set arguments which are already managed by the operator: storage.tsdb.max-block-duration,storage.tsdb.min-block-duration

wider output (less readable though)

    Message:               shard 0: statefulset kube-prometheus-stack/prometheus-kube-prometheus-stack-prometheus not found
    Observed Generation:   1
    Reason:                StatefulSetNotFound
    Status:                False
    Type:                  Available
    Last Transition Time:  2024-02-19T01:49:52Z
    Message:               making statefulset failed: make StatefulSet spec: can't set arguments which are already managed by the operator: storage.tsdb.max-block-duration
    Observed Generation:   1
    Reason:                ReconciliationFailed
    Status:                False
    Type:                  Reconciled

How this should be handled?

fniko avatar Feb 19 '24 01:02 fniko

The tsdb block duration arguments can be set through additionalArgs only if disableCompaction is not set (default is false), i.e. if compaction is enabled. If set to true, the operator does not allow overriding the arguments.

Furthermore, if spec.thanos is set in prometheus CR with objectStorageConfig defined, i.e. uploads are active, the operator disables compaction by setting the two block duration arguments equal. In these conditions, you may wish to have a look at blockSize in thanosSpec. The field is not present in the values' prometheus.prometheusSpec.thanos but will be taken over once inserted.

zeritti avatar Feb 19 '24 18:02 zeritti

Oh, OK. Thank you for your help. So I think I will be closing this issue because it's not an issue rather than a configuration mismatch. or do you think it makes sense to improve some docs or any other aspect of the helm chart? If not, I will close the issue immediately.

Configuration that works, for others.

prometheus:
  prometheusSpec:
    # Configure Thanos
    thanos:
      image: quay.io/thanos/thanos:v0.28.1
      blockSize: "30s"
      objectStorageConfig:
        secret:
          type: S3
          config:
            bucket: "thanos"
            endpoint: "region.provider.com"
            access_key: "xxx"
            secret_key: "xxx"

fniko avatar Feb 19 '24 18:02 fniko