helm-charts icon indicating copy to clipboard operation
helm-charts copied to clipboard

[prometheus-kube-stack] Prometheus in Agent Mode

Open sherifkayad opened this issue 3 years ago • 19 comments

Is your feature request related to a problem ?

Not a problem by any means; rather an enhancement. This issue here is to enable (in the future once all perquisites are met) Prometheus running in the new so-called agent mode.

Following the links below:

  • The announcement from Prometheus (https://prometheus.io/blog/2021/11/16/agent/) that a new mode of operation agent will be available in the next release
  • The Issue created and currently WIP at https://github.com/prometheus-operator/prometheus-operator/issues/3989
  • The PR related to the issue above at https://github.com/prometheus-operator/prometheus-operator/pull/4417

Describe the solution you'd like.

Introducing the relevant values in the helm chart to enable Prometheus running in agent mode with the possibility to define a remote write endpoint for it once the prometheus-operator project enables that

Describe alternatives you've considered.

maintaining the status quo => running a fully fledged Prometheus Server (+ Thanos Sidecar) in all cattle clusters

Additional context.

No response

sherifkayad avatar Nov 22 '21 08:11 sherifkayad

Also kube-prometheus is adding support as soon as prometheus-operator changes land: https://github.com/prometheus-operator/kube-prometheus/pull/1472

paulfantom avatar Nov 22 '21 14:11 paulfantom

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Dec 22 '21 19:12 stale[bot]

Reviving this issue ..

sherifkayad avatar Dec 23 '21 03:12 sherifkayad

It seems that as of now, the Prometheus Operator and Kube Promtheus support in a way running Prometheus (with the current CRD) in an agent mode. Yet the separate CRD (e.g. PrometheusAgent) is still in design (https://github.com/prometheus-operator/prometheus-operator/issues/3989).

@project_maintainers What do you think .. should the Helm Charts start supporting the Agent now and use what's there until the new CRD lands or better wait?

sherifkayad avatar Jan 20 '22 07:01 sherifkayad

i would wait for the final solution.

monotek avatar Jan 20 '22 08:01 monotek

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Feb 20 '22 14:02 stale[bot]

Again keeping this one alive

sherifkayad avatar Feb 20 '22 20:02 sherifkayad

Keeping this issue alive

mahmoud-hafez-aws avatar Mar 15 '22 08:03 mahmoud-hafez-aws

Again keeping this one alive

sharathfeb12 avatar Apr 05 '22 04:04 sharathfeb12

Trying to configure and seeing this error: "field alerting is not allowed in agent mode".

That field is hardcoded into the code, I don't see how to remove it:

https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16

mfilipe avatar May 05 '22 18:05 mfilipe

Trying to configure and seeing this error: "field alerting is not allowed in agent mode".

That field is hardcoded into the code, I don't see how to remove it:

https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L14-L16

The same issue will happen with ruleSelector and ruleNamespaceSelector fields, with this error : "field rule_files is not allowed in agent mode" . It's also hardcoded into the code : https://github.com/prometheus-community/helm-charts/blob/ee700113d070664053bb041f97ae5356e6072fe6/charts/kube-prometheus-stack/templates/prometheus/prometheus.yaml#L193-L207

nikoul avatar May 19 '22 12:05 nikoul

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Jun 18 '22 19:06 stale[bot]

Looking forward for this feature as well.

utkuozdemir avatar Jun 18 '22 19:06 utkuozdemir

It will be so nice to have this feature!

r12f avatar Jun 22 '22 14:06 r12f

Any updates?

maxpain avatar Jul 11 '22 06:07 maxpain

I ran into the same problem with the prometheus chart so have raised #2290. I suspect that the two will have a very similar solution.

nedl86 avatar Jul 20 '22 09:07 nedl86

I deployed using this value.prom-agent.yaml and agent mode seems to be working on my setup.

server:
  enabled: true
  defaultFlagsOverride:
  - --enable-feature=agent
  - --storage.agent.retention.max-time=30m
  - --config.file=/etc/config/prometheus.yml
  configPath: /etc/config/prometheus.yml

serverFiles:
  prometheus.yml:
    remote_write:
    - url: http://10.138.0.103/api/v1/write
    scrape_configs:
      - job_name: prometheus
        static_configs:
          - targets:
            - localhost:9090
    rule_files:


pushgateway:
  enabled: false

alertmanager:
  enabled: false

grafana:
  enabled: false

defaultRules:
  create: false

Deployment:

helm upgrade --install -n monitoring prometheus prometheus-community/prometheus -f ./value.prom-agent.yaml

Log on agent side

ts=2022-07-20T12:36:15.169Z caller=main.go:184 level=info msg="Experimental agent mode enabled."
ts=2022-07-20T12:36:15.169Z caller=main.go:516 level=info msg="Starting Prometheus" version="(version=2.34.0, branch=HEAD, revision=881111fec4332c33094a6fb2680c71fffc427275)"
ts=2022-07-20T12:36:15.169Z caller=main.go:521 level=info build_context="(go=go1.17.8, user=root@121ad7ea5487, date=20220315-15:18:00)"
ts=2022-07-20T12:36:15.169Z caller=main.go:522 level=info host_details="(Linux 5.4.188+ #1 SMP Sun Apr 24 10:03:06 PDT 2022 x86_64 prometheus-server-588f957dd9-2ttj6 (none))"
ts=2022-07-20T12:36:15.169Z caller=main.go:523 level=info fd_limits="(soft=1048576, hard=1048576)"
ts=2022-07-20T12:36:15.169Z caller=main.go:524 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2022-07-20T12:36:15.171Z caller=web.go:540 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2022-07-20T12:36:15.172Z caller=main.go:993 level=info msg="Starting WAL storage ..."
ts=2022-07-20T12:36:15.173Z caller=tls_config.go:195 level=info component=web msg="TLS is disabled." http2=false
ts=2022-07-20T12:36:15.175Z caller=db.go:332 level=info msg="replaying WAL, this may take a while" dir=data-agent/wal
ts=2022-07-20T12:36:15.175Z caller=db.go:383 level=info msg="WAL segment loaded" segment=0 maxSegment=0
ts=2022-07-20T12:36:15.176Z caller=main.go:1014 level=info fs_type=EXT4_SUPER_MAGIC
ts=2022-07-20T12:36:15.176Z caller=main.go:1017 level=info msg="Agent WAL storage started"
ts=2022-07-20T12:36:15.176Z caller=main.go:1142 level=info msg="Loading configuration file" filename=/etc/config/prometheus.yml
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting WAL watcher" queue=bfe160
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Starting scraped metadata watcher"
ts=2022-07-20T12:36:15.176Z caller=dedupe.go:112 component=remote level=info remote_name=bfe160 url=http://10.138.0.103/api/v1/write msg="Replaying WAL" queue=bfe160
ts=2022-07-20T12:36:15.177Z caller=main.go:1179 level=info msg="Completed loading of configuration file" filename=/etc/config/prometheus.yml totalDuration=1.163675ms db_storage=596ns remote_storage=443.422µs web_handler=469ns query_engine=618ns scrape=376.568µs scrape_sd=20.749µs notify=1.104µs notify_sd=1.965µs rules=425ns tracing=6.43µs

datlife avatar Jul 20 '22 12:07 datlife

I tried to use the same config as @datlife, but still on the same error field rule_files is not allowed in agent mode. Has anyone tried another way? 😢

mariana-mendes avatar Sep 06 '22 19:09 mariana-mendes

I tried to use the same config as @datlife, but still on the same error field rule_files is not allowed in agent mode. Has anyone tried another way? cry

I tried the same config in the past for testing and it worked like a charm (thanks @datlife!), just tried again a quick run with a Kind 1.24 cluster (with a false remote, just to check if it starts), overriding just the Prometheus version, and it worked too.

You can check if you are using latest helm chart version (15.12.0) and optionally latest Prometheus version (v2.38.0, setting it via server.image.tag).

For the sake of completeness, here are the exact commands:

agent.yaml:

server:
  enabled: true
  image:
    tag: v2.38.0
  defaultFlagsOverride:
  - --enable-feature=agent
  - --storage.agent.retention.max-time=30m
  - --config.file=/etc/config/prometheus.yml
  configPath: /etc/config/prometheus.yml

serverFiles:
  prometheus.yml:
    remote_write:
    - url: http://10.138.0.103/api/v1/write
    scrape_configs:
      - job_name: prometheus
        static_configs:
          - targets:
            - localhost:9090
    rule_files:


pushgateway:
  enabled: false

alertmanager:
  enabled: false

grafana:
  enabled: false

defaultRules:
  create: false

cluster.yaml:

apiVersion: kind.x-k8s.io/v1alpha4
name: app-cluster
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    listenAddress: "0.0.0.0"
    protocol: tcp
  - containerPort: 443
    hostPort: 443
    listenAddress: "0.0.0.0"
    protocol: tcp

Cluster create:

kind create cluster --config cluster.yaml

Helm install:

helm upgrade --install -n monitoring prometheus prometheus-community/prometheus --version 15.12.0 --create-namespace -f agent.yaml

Proofs:

$ kubectl get pods -A                

NAMESPACE            NAME                                                READY   STATUS    RESTARTS   AGE
kube-system          coredns-6d4b75cb6d-f4j4x                            1/1     Running   0          61m
kube-system          coredns-6d4b75cb6d-q52f7                            1/1     Running   0          61m
kube-system          etcd-app-cluster-control-plane                      1/1     Running   0          62m
kube-system          kindnet-n6dsx                                       1/1     Running   0          61m
kube-system          kube-apiserver-app-cluster-control-plane            1/1     Running   0          62m
kube-system          kube-controller-manager-app-cluster-control-plane   1/1     Running   0          62m
kube-system          kube-proxy-ms7vv                                    1/1     Running   0          61m
kube-system          kube-scheduler-app-cluster-control-plane            1/1     Running   0          62m
local-path-storage   local-path-provisioner-9cd9bd544-mgdj9              1/1     Running   0          61m
monitoring           prometheus-kube-state-metrics-774f8c7564-8k4px      1/1     Running   0          60m
monitoring           prometheus-node-exporter-54wzh                      1/1     Running   0          60m
monitoring           prometheus-server-f65746fd9-wl9rz                   2/2     Running   0          54m

$ kubectl port-forward -n monitoring svc/prometheus-server 9090:80 # on browser check http://localhost:9090

agent

bryanasdev000 avatar Sep 07 '22 08:09 bryanasdev000

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Oct 12 '22 05:10 stale[bot]

still relevant

hoerup avatar Oct 12 '22 06:10 hoerup

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

stale[bot] avatar Nov 26 '22 21:11 stale[bot]

Still relevant

hoerup avatar Nov 26 '22 22:11 hoerup

still relevant.

sharathfeb12 avatar Dec 14 '22 06:12 sharathfeb12

Still relevant to me

clarkezone avatar Dec 14 '22 18:12 clarkezone

still relevant to me

kgogolek avatar Dec 20 '22 15:12 kgogolek

still relevant to me

ranryl avatar Jan 04 '23 09:01 ranryl

please

alessioga avatar Feb 01 '23 09:02 alessioga

Still relevant for me.

dnaprawa-capgemini avatar Feb 16 '23 10:02 dnaprawa-capgemini

Please

clarkezone avatar Feb 17 '23 22:02 clarkezone