helm-charts
helm-charts copied to clipboard
[prometheus-kube-stack] Prometheus in Agent Mode
Is your feature request related to a problem ?
Not a problem by any means; rather an enhancement. This issue here is to enable (in the future once all perquisites are met) Prometheus running in the new so-called agent
mode.
Following the links below:
- The announcement from Prometheus (https://prometheus.io/blog/2021/11/16/agent/) that a new mode of operation
agent
will be available in the next release - The Issue created and currently WIP at https://github.com/prometheus-operator/prometheus-operator/issues/3989
- The PR related to the issue above at https://github.com/prometheus-operator/prometheus-operator/pull/4417
Describe the solution you'd like.
Introducing the relevant values in the helm chart to enable Prometheus running in agent
mode with the possibility to define a remote write endpoint for it once the prometheus-operator
project enables that
Describe alternatives you've considered.
maintaining the status quo => running a fully fledged Prometheus Server (+ Thanos Sidecar) in all cattle clusters
Additional context.
No response
Also kube-prometheus is adding support as soon as prometheus-operator changes land: https://github.com/prometheus-operator/kube-prometheus/pull/1472
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Reviving this issue ..
It seems that as of now, the Prometheus Operator and Kube Promtheus support in a way running Prometheus (with the current CRD) in an agent mode. Yet the separate CRD (e.g. PrometheusAgent
) is still in design (https://github.com/prometheus-operator/prometheus-operator/issues/3989).
@project_maintainers What do you think .. should the Helm Charts start supporting the Agent now and use what's there until the new CRD lands or better wait?
i would wait for the final solution.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Again keeping this one alive
Keeping this issue alive
Again keeping this one alive
Trying to configure and seeing this error: "field alerting is not allowed in agent mode".
That field is hardcoded into the code, I don't see how to remove it:
https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16
Trying to configure and seeing this error: "field alerting is not allowed in agent mode".
That field is hardcoded into the code, I don't see how to remove it:
https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16
The same issue will happen with ruleSelector
and ruleNamespaceSelector
fields, with this error : "field rule_files is not allowed in agent mode" . It's also hardcoded into the code : https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L193-L207
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Looking forward for this feature as well.
It will be so nice to have this feature!
Any updates?
I ran into the same problem with the prometheus
chart so have raised #2290. I suspect that the two will have a very similar solution.
I deployed using this value.prom-agent.yaml
and agent mode seems to be working on my setup.
server:
enabled: true
defaultFlagsOverride:
- --enable-feature=agent
- --storage.agent.retention.max-time=30m
- --config.file=/etc/config/prometheus.yml
configPath: /etc/config/prometheus.yml
serverFiles:
prometheus.yml:
remote_write:
- url: http://10.138.0.103/api/v1/write
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
rule_files:
pushgateway:
enabled: false
alertmanager:
enabled: false
grafana:
enabled: false
defaultRules:
create: false
Deployment:
helm upgrade --install -n monitoring prometheus prometheus-community/prometheus -f ./value.prom-agent.yaml
Log on agent side
ts=2022-07-20T12:36:15.169Z caller=main.go:184 level=info msg="Experimental agent mode enabled."
ts=2022-07-20T12:36:15.169Z caller=main.go:516 level=info msg="Starting Prometheus" version="(version=2.34.0, branch=HEAD, revision=881111fec4332c33094a6fb2680c71fffc427275)"
ts=2022-07-20T12:36:15.169Z caller=main.go:521 level=info build_context="(go=go1.17.8, user=root@121ad7ea5487, date=20220315-15:18:00)"
ts=2022-07-20T12:36:15.169Z caller=main.go:522 level=info host_details="(Linux 5.4.188+ #1 SMP Sun Apr 24 10:03:06 PDT 2022 x86_64 prometheus-server-588f957dd9-2ttj6 (none))"
ts=2022-07-20T12:36:15.169Z caller=main.go:523 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-07-20T12:36:15.169Z caller=main.go:524 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-07-20T12:36:15.171Z caller=web.go:540 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-07-20T12:36:15.172Z caller=main.go:993 level=info msg="Starting WAL storage ..."
ts=2022-07-20T12:36:15.173Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-07-20T12:36:15.175Z caller=db.go:332 level=info msg="replaying WAL, this may take a while" dir=data-agent/wal
ts=2022-07-20T12:36:15.175Z caller=db.go:383 level=info msg="WAL segment loaded" segment=0 maxSegment=0
ts=2022-07-20T12:36:15.176Z caller=main.go:1014 level=info fs_type=EXT4_SUPER_MAGIC
ts=2022-07-20T12:36:15.176Z caller=main.go:1017 level=info msg="Agent WAL storage started"
ts=2022-07-20T12:36:15.176Z caller=main.go:1142 level=info msg="Loading configuration file" filename=/etc/config/prometheus.yml
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting WAL watcher" queue=bfe160
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting scraped metadata watcher"
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Replaying WAL" queue=bfe160
ts=2022-07-20T12:36:15.177Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/config/prometheus.yml totalDuration=1.163675ms db_storage=596ns remote_storage=443.422µs web_handler=469ns query_engine=618ns scrape=376.568µs scrape_sd=20.749µs notify=1.104µs notify_sd=1.965µs rules=425ns tracing=6.43µs
I tried to use the same config as @datlife, but still on the same error field rule_files is not allowed in agent mode.
Has anyone tried another way? 😢
I tried to use the same config as @datlife, but still on the same error
field rule_files is not allowed in agent mode.
Has anyone tried another way? cry
I tried the same config in the past for testing and it worked like a charm (thanks @datlife!), just tried again a quick run with a Kind 1.24 cluster (with a false remote, just to check if it starts), overriding just the Prometheus version, and it worked too.
You can check if you are using latest helm chart version (15.12.0) and optionally latest Prometheus version (v2.38.0, setting it via server.image.tag
).
For the sake of completeness, here are the exact commands:
agent.yaml:
server:
enabled: true
image:
tag: v2.38.0
defaultFlagsOverride:
- --enable-feature=agent
- --storage.agent.retention.max-time=30m
- --config.file=/etc/config/prometheus.yml
configPath: /etc/config/prometheus.yml
serverFiles:
prometheus.yml:
remote_write:
- url: http://10.138.0.103/api/v1/write
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
rule_files:
pushgateway:
enabled: false
alertmanager:
enabled: false
grafana:
enabled: false
defaultRules:
create: false
cluster.yaml:
apiVersion: kind.x-k8s.io/v1alpha4
name: app-cluster
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
listenAddress: "0.0.0.0"
protocol: tcp
- containerPort: 443
hostPort: 443
listenAddress: "0.0.0.0"
protocol: tcp
Cluster create:
kind create cluster --config cluster.yaml
Helm install:
helm upgrade --install -n monitoring prometheus prometheus-community/prometheus --version 15.12.0 --create-namespace -f agent.yaml
Proofs:
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-6d4b75cb6d-f4j4x 1/1 Running 0 61m
kube-system coredns-6d4b75cb6d-q52f7 1/1 Running 0 61m
kube-system etcd-app-cluster-control-plane 1/1 Running 0 62m
kube-system kindnet-n6dsx 1/1 Running 0 61m
kube-system kube-apiserver-app-cluster-control-plane 1/1 Running 0 62m
kube-system kube-controller-manager-app-cluster-control-plane 1/1 Running 0 62m
kube-system kube-proxy-ms7vv 1/1 Running 0 61m
kube-system kube-scheduler-app-cluster-control-plane 1/1 Running 0 62m
local-path-storage local-path-provisioner-9cd9bd544-mgdj9 1/1 Running 0 61m
monitoring prometheus-kube-state-metrics-774f8c7564-8k4px 1/1 Running 0 60m
monitoring prometheus-node-exporter-54wzh 1/1 Running 0 60m
monitoring prometheus-server-f65746fd9-wl9rz 2/2 Running 0 54m
$ kubectl port-forward -n monitoring svc/prometheus-server 9090:80 # on browser check http://localhost:9090
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
still relevant
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.
Still relevant
still relevant.
Still relevant to me
still relevant to me
still relevant to me
please
Still relevant for me.
Please