flagger
flagger copied to clipboard
Flagger complaining about not reaching prometheus.
Describe the bug
Flagger controller attempts to connect to prometheus, but is unable to do so.
{"level":"error","ts":"2025-01-23T09:22:50.821Z","caller":"controller/events.go:39","msg":"Error checking metric providers: prometheus not avaiable: running query failed: request failed: Get \"http://prometheus:9090/api/v1/query?query=vector%281%29\": dial tcp: lookup prometheus on 10.100.0.10:53: no such host","canary":"fluxcd-test.fluxcd-test","stacktrace":"github.com/fluxcd/flagger/pkg/controller.(*Controller).recordEventErrorf\n\t/workspace/pkg/controller/events.go:39\ngithub.com/fluxcd/flagger/pkg/controller.(*Controller).advanceCanary\n\t/workspace/pkg/controller/scheduler.go:207\ngithub.com/fluxcd/flagger/pkg/controller.CanaryJob.Start.func1\n\t/workspace/pkg/controller/job.go:39"}
Flagger canary is stuck in initializing.
k describe canary ...
Warning Synced 25m (x785 over 13h) flagger reconcileDestinationRule failed: DestinationRule fluxcd-test-canary.fluxcd-test create error: the server could not find the requested resource (post destinationrules.networking.istio.io)
Warning Synced 4m12s (x807 over 13h) flagger Error checking metric providers: prometheus not avaiable: running query failed: request failed: Get "http://prometheus:9090/api/v1/query?query=vector%281%29": dial tcp: lookup prometheus on 10.100.0.10:53: no such host
To Reproduce
- Install flux
flux installand bootstrap. - Use these manifests
apiVersion: v1
kind: Namespace
metadata:
name: flagger-system
---
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: flagger
namespace: flagger-system
spec:
interval: 1h
releaseName: flagger
install: # override existing Flagger CRDs
crds: CreateReplace
upgrade: # update Flagger CRDs
crds: CreateReplace
chart:
spec:
chart: flagger
version: 1.x # update Flagger to the latest minor version
interval: 6h # scan for new versions every six hours
sourceRef:
kind: HelmRepository
name: flagger
verify: # verify the chart signature with Cosign keyless
provider: cosign
values:
nodeSelector:
beta.kubernetes.io/os: linux
---
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: flagger-loadtester
namespace: flagger-system
spec:
interval: 6h
wait: true
timeout: 5m
prune: true
sourceRef:
kind: OCIRepository
name: flagger-loadtester
path: ./tester
targetNamespace: flagger-system
---
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: flagger
namespace: flagger-system
spec:
interval: 1h
url: oci://ghcr.io/fluxcd/charts
type: oci
---
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: flagger-loadtester
namespace: flagger-system
spec:
interval: 6h
url: oci://ghcr.io/fluxcd/flagger-manifests
ref:
semver: 1.x
verify:
provider: cosign
and these are more application specific manifests
apiVersion: v1
kind: Namespace
metadata:
name: fluxcd-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: fluxcd-test
name: fluxcd-test
namespace: fluxcd-test
spec:
replicas: 1
selector:
matchLabels:
app: fluxcd-test
template:
metadata:
labels:
app: fluxcd-test
spec:
containers:
- image: myimage:latest
name: fluxcd-test
---
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: fluxcd-test
namespace: fluxcd-test
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: fluxcd-test
service:
port: 5000
analysis:
interval: 1m
threshold: 10
maxWeight: 50
stepWeight: 5
metrics:
# each minute check if application has above 99% codes that are not 5xx responses
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
# each minute check if application has below 500ms response time
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester.flagger-system/
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://fluxcd-test-canary.fluxcd-test:5000/"
Expected behavior
I expect no errors from the controller.
I expect the canary to be in another state than initializing?
Additional context
- Flagger version: 1.40.0
- Kubernetes version: Server version is v1.30.8-gke.1051000
- Service Mesh provider: n/a
- Ingress provider: n/a