helm-charts [prometheus-kube-stack] Prometheus in Agent Mode

Is your feature request related to a problem ?

Not a problem by any means; rather an enhancement. This issue here is to enable (in the future once all perquisites are met) Prometheus running in the new so-called agent mode.

Following the links below:

The announcement from Prometheus (https://prometheus.io/blog/2021/11/16/agent/) that a new mode of operation agent will be available in the next release
The Issue created and currently WIP at https://github.com/prometheus-operator/prometheus-operator/issues/3989
The PR related to the issue above at https://github.com/prometheus-operator/prometheus-operator/pull/4417

Describe the solution you'd like.

Introducing the relevant values in the helm chart to enable Prometheus running in agent mode with the possibility to define a remote write endpoint for it once the prometheus-operator project enables that

Describe alternatives you've considered.

maintaining the status quo => running a fully fledged Prometheus Server (+ Thanos Sidecar) in all cattle clusters

Additional context.

No response

Nov 22 '21 08:11 sherifkayad

Also kube-prometheus is adding support as soon as prometheus-operator changes land: https://github.com/prometheus-operator/kube-prometheus/pull/1472

Nov 22 '21 14:11 paulfantom

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Dec 22 '21 19:12 stale[bot]

Reviving this issue ..

Dec 23 '21 03:12 sherifkayad

It seems that as of now, the Prometheus Operator and Kube Promtheus support in a way running Prometheus (with the current CRD) in an agent mode. Yet the separate CRD (e.g. PrometheusAgent) is still in design (https://github.com/prometheus-operator/prometheus-operator/issues/3989).

@project_maintainers What do you think .. should the Helm Charts start supporting the Agent now and use what's there until the new CRD lands or better wait?

Jan 20 '22 07:01 sherifkayad

i would wait for the final solution.

Jan 20 '22 08:01 monotek

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Feb 20 '22 14:02 stale[bot]

Again keeping this one alive

Feb 20 '22 20:02 sherifkayad

Keeping this issue alive

Mar 15 '22 08:03 mahmoud-hafez-aws

Again keeping this one alive

Apr 05 '22 04:04 sharathfeb12

Trying to configure and seeing this error: "field alerting is not allowed in agent mode".

That field is hardcoded into the code, I don't see how to remove it:

https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16

May 05 '22 18:05 mfilipe

Trying to configure and seeing this error: "field alerting is not allowed in agent mode".

That field is hardcoded into the code, I don't see how to remove it:

https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16

The same issue will happen with ruleSelector and ruleNamespaceSelector fields, with this error : "field rule_files is not allowed in agent mode" . It's also hardcoded into the code : https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L193-L207

May 19 '22 12:05 nikoul

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Jun 18 '22 19:06 stale[bot]

Looking forward for this feature as well.

Jun 18 '22 19:06 utkuozdemir

It will be so nice to have this feature!

Jun 22 '22 14:06 r12f

Any updates?

Jul 11 '22 06:07 maxpain

I ran into the same problem with the prometheus chart so have raised #2290. I suspect that the two will have a very similar solution.

Jul 20 '22 09:07 nedl86

I deployed using this value.prom-agent.yaml and agent mode seems to be working on my setup.

server:
  enabled: true
  defaultFlagsOverride:
  - --enable-feature=agent
  - --storage.agent.retention.max-time=30m
  - --config.file=/etc/config/prometheus.yml
  configPath: /etc/config/prometheus.yml

serverFiles:
  prometheus.yml:
    remote_write:
    - url: http://10.138.0.103/api/v1/write
    scrape_configs:
      - job_name: prometheus
        static_configs:
          - targets:
            - localhost:9090
    rule_files:


pushgateway:
  enabled: false

alertmanager:
  enabled: false

grafana:
  enabled: false

defaultRules:
  create: false

Deployment:

helm upgrade --install -n monitoring prometheus prometheus-community/prometheus -f ./value.prom-agent.yaml

Log on agent side

ts=2022-07-20T12:36:15.169Z caller=main.go:184 level=info msg="Experimental agent mode enabled."
ts=2022-07-20T12:36:15.169Z caller=main.go:516 level=info msg="Starting Prometheus" version="(version=2.34.0, branch=HEAD, revision=881111fec4332c33094a6fb2680c71fffc427275)"
ts=2022-07-20T12:36:15.169Z caller=main.go:521 level=info build_context="(go=go1.17.8, user=root@121ad7ea5487, date=20220315-15:18:00)"
ts=2022-07-20T12:36:15.169Z caller=main.go:522 level=info host_details="(Linux 5.4.188+ #1 SMP Sun Apr 24 10:03:06 PDT 2022 x86_64 prometheus-server-588f957dd9-2ttj6 (none))"
ts=2022-07-20T12:36:15.169Z caller=main.go:523 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-07-20T12:36:15.169Z caller=main.go:524 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-07-20T12:36:15.171Z caller=web.go:540 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-07-20T12:36:15.172Z caller=main.go:993 level=info msg="Starting WAL storage ..."
ts=2022-07-20T12:36:15.173Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-07-20T12:36:15.175Z caller=db.go:332 level=info msg="replaying WAL, this may take a while" dir=data-agent/wal
ts=2022-07-20T12:36:15.175Z caller=db.go:383 level=info msg="WAL segment loaded" segment=0 maxSegment=0
ts=2022-07-20T12:36:15.176Z caller=main.go:1014 level=info fs_type=EXT4_SUPER_MAGIC
ts=2022-07-20T12:36:15.176Z caller=main.go:1017 level=info msg="Agent WAL storage started"
ts=2022-07-20T12:36:15.176Z caller=main.go:1142 level=info msg="Loading configuration file" filename=/etc/config/prometheus.yml
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting WAL watcher" queue=bfe160
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting scraped metadata watcher"
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Replaying WAL" queue=bfe160
ts=2022-07-20T12:36:15.177Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/config/prometheus.yml totalDuration=1.163675ms db_storage=596ns remote_storage=443.422µs web_handler=469ns query_engine=618ns scrape=376.568µs scrape_sd=20.749µs notify=1.104µs notify_sd=1.965µs rules=425ns tracing=6.43µs

Jul 20 '22 12:07 datlife

I tried to use the same config as @datlife, but still on the same error field rule_files is not allowed in agent mode. Has anyone tried another way? 😢

Sep 06 '22 19:09 mariana-mendes

I tried to use the same config as @datlife, but still on the same error field rule_files is not allowed in agent mode. Has anyone tried another way? cry

I tried the same config in the past for testing and it worked like a charm (thanks @datlife!), just tried again a quick run with a Kind 1.24 cluster (with a false remote, just to check if it starts), overriding just the Prometheus version, and it worked too.

You can check if you are using latest helm chart version (15.12.0) and optionally latest Prometheus version (v2.38.0, setting it via server.image.tag).

For the sake of completeness, here are the exact commands:

agent.yaml:

server:
  enabled: true
  image:
    tag: v2.38.0
  defaultFlagsOverride:
  - --enable-feature=agent
  - --storage.agent.retention.max-time=30m
  - --config.file=/etc/config/prometheus.yml
  configPath: /etc/config/prometheus.yml

serverFiles:
  prometheus.yml:
    remote_write:
    - url: http://10.138.0.103/api/v1/write
    scrape_configs:
      - job_name: prometheus
        static_configs:
          - targets:
            - localhost:9090
    rule_files:


pushgateway:
  enabled: false

alertmanager:
  enabled: false

grafana:
  enabled: false

defaultRules:
  create: false

cluster.yaml:

apiVersion: kind.x-k8s.io/v1alpha4
name: app-cluster
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    listenAddress: "0.0.0.0"
    protocol: tcp
  - containerPort: 443
    hostPort: 443
    listenAddress: "0.0.0.0"
    protocol: tcp

Cluster create:

kind create cluster --config cluster.yaml

Helm install:

helm upgrade --install -n monitoring prometheus prometheus-community/prometheus --version 15.12.0 --create-namespace -f agent.yaml

Proofs:

$ kubectl get pods -A                

NAMESPACE            NAME                                                READY   STATUS    RESTARTS   AGE
kube-system          coredns-6d4b75cb6d-f4j4x                            1/1     Running   0          61m
kube-system          coredns-6d4b75cb6d-q52f7                            1/1     Running   0          61m
kube-system          etcd-app-cluster-control-plane                      1/1     Running   0          62m
kube-system          kindnet-n6dsx                                       1/1     Running   0          61m
kube-system          kube-apiserver-app-cluster-control-plane            1/1     Running   0          62m
kube-system          kube-controller-manager-app-cluster-control-plane   1/1     Running   0          62m
kube-system          kube-proxy-ms7vv                                    1/1     Running   0          61m
kube-system          kube-scheduler-app-cluster-control-plane            1/1     Running   0          62m
local-path-storage   local-path-provisioner-9cd9bd544-mgdj9              1/1     Running   0          61m
monitoring           prometheus-kube-state-metrics-774f8c7564-8k4px      1/1     Running   0          60m
monitoring           prometheus-node-exporter-54wzh                      1/1     Running   0          60m
monitoring           prometheus-server-f65746fd9-wl9rz                   2/2     Running   0          54m

$ kubectl port-forward -n monitoring svc/prometheus-server 9090:80 # on browser check http://localhost:9090

agent

Sep 07 '22 08:09 bryanasdev000

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Oct 12 '22 05:10 stale[bot]

still relevant

Oct 12 '22 06:10 hoerup

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

Nov 26 '22 21:11 stale[bot]

Still relevant

Nov 26 '22 22:11 hoerup

still relevant.

Dec 14 '22 06:12 sharathfeb12

Still relevant to me