flagger icon indicating copy to clipboard operation
flagger copied to clipboard

Flagger complaining about not reaching prometheus.

Open LittaKake opened this issue 11 months ago • 0 comments

Describe the bug

Flagger controller attempts to connect to prometheus, but is unable to do so.

{"level":"error","ts":"2025-01-23T09:22:50.821Z","caller":"controller/events.go:39","msg":"Error checking metric providers: prometheus not avaiable: running query failed: request failed: Get \"http://prometheus:9090/api/v1/query?query=vector%281%29\": dial tcp: lookup prometheus on 10.100.0.10:53: no such host","canary":"fluxcd-test.fluxcd-test","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:207\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}

Flagger canary is stuck in initializing.

k describe canary ...
 Warning  Synced  25m (x785 over 13h)    flagger  reconcileDestinationRule failed: DestinationRule fluxcd-test-canary.fluxcd-test create error: the server could not find the requested resource (post destinationrules.networking.istio.io)
  Warning  Synced  4m12s (x807 over 13h)  flagger  Error checking metric providers: prometheus not avaiable: running query failed: request failed: Get "http://prometheus:9090/api/v1/query?query=vector%281%29": dial tcp: lookup prometheus on 10.100.0.10:53: no such host

To Reproduce

  1. Install flux flux install and bootstrap.
  2. Use these manifests
apiVersion: v1
kind: Namespace
metadata:
  name: flagger-system
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: flagger
  namespace: flagger-system
spec:
  interval: 1h
  releaseName: flagger
  install: # override existing Flagger CRDs
    crds: CreateReplace
  upgrade: # update Flagger CRDs
    crds: CreateReplace
  chart:
    spec:
      chart: flagger
      version: 1.x # update Flagger to the latest minor version
      interval: 6h # scan for new versions every six hours
      sourceRef:
        kind: HelmRepository
        name: flagger
      verify: # verify the chart signature with Cosign keyless
        provider: cosign 
  values:
    nodeSelector:
      beta.kubernetes.io/os: linux
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  name: flagger-loadtester
  namespace: flagger-system
spec:
  interval: 6h
  wait: true
  timeout: 5m
  prune: true
  sourceRef:
    kind: OCIRepository
    name: flagger-loadtester
  path: ./tester
  targetNamespace: flagger-system
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
  name: flagger
  namespace: flagger-system
spec:
  interval: 1h
  url: oci://ghcr.io/fluxcd/charts
  type: oci
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
  name: flagger-loadtester
  namespace: flagger-system
spec:
  interval: 6h
  url: oci://ghcr.io/fluxcd/flagger-manifests
  ref:
    semver: 1.x
  verify:
    provider: cosign

and these are more application specific manifests

apiVersion: v1
kind: Namespace
metadata:
  name: fluxcd-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fluxcd-test
  name: fluxcd-test
  namespace: fluxcd-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fluxcd-test
  template:
    metadata:
      labels:
        app: fluxcd-test
    spec:
      containers:
      - image: myimage:latest
        name: fluxcd-test
---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: fluxcd-test
  namespace: fluxcd-test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: fluxcd-test
  service:
    port: 5000
  analysis:
    interval: 1m
    threshold: 10
    maxWeight: 50
    stepWeight: 5
    metrics:
    # each minute check if application has above 99% codes that are not 5xx responses
    - name: request-success-rate
      thresholdRange:
        min: 99
      interval: 1m
    # each minute check if application has below 500ms response time
    - name: request-duration
      thresholdRange:
        max: 500
      interval: 1m
    webhooks:
    - name: load-test
      url: http://flagger-loadtester.flagger-system/
      metadata:
        cmd: "hey -z 1m -q 10 -c 2 http://fluxcd-test-canary.fluxcd-test:5000/"

Expected behavior

I expect no errors from the controller.

I expect the canary to be in another state than initializing?

Additional context

  • Flagger version: 1.40.0
  • Kubernetes version: Server version is v1.30.8-gke.1051000
  • Service Mesh provider: n/a
  • Ingress provider: n/a

LittaKake avatar Jan 23 '25 09:01 LittaKake