pg_prometheus icon indicating copy to clipboard operation
pg_prometheus copied to clipboard

Unable to configure High Availability of Prometheus with timescaleDB

Open MohanSaiTeki opened this issue 5 years ago • 4 comments

I am trying to set up the High Availability of Prometheus using timescaleDB with below configurations.

Node exporter

docker run -d -p 9100:9100 quay.io/prometheus/node-exporter

Prometheus

  • prometheus-1 docker run -it -p 9090:9090 -v /root/prometheus/prometheus1.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus1.yml

global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9201/write" remote_read: url: "http://10.128.15.221:9201/read" read_recent: true

  • prometheus-2 docker run -it -p 9091:9090 -v /root/prometheus/prometheus2.yml:/etc/prometheus/prometheus.yml prom/prometheus
  • prometheus2.yml

global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9202/write" remote_read: url: "http://10.128.15.221:9202/read" read_recent: true

Prometheus adapter

  • prometheus-adapter-1 docker run -it -p 9201:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

  • prometheus-adapter-2 docker run -it -p 9202:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s

pg_prometheus

docker run --name pg_prometheus -e POSTGRES_PASSWORD=secret -it -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres -csynchronous_commit=off

When I spin up, everything is working fine with the below status.

  • prometheus-adapter-1 -> Leader log

{"caller":"log.go:27","count":100,"duration":0.007022144,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:01.146Z"} {"caller":"log.go:27","count":100,"duration":0.007113201,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.119Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.119Z"} {"caller":"log.go:27","count":100,"duration":0.006514815,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.128Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":200,"ts":"2020-03-09T10:02:06.128Z"} {"caller":"log.go:27","count":100,"duration":0.00611504,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.136Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.136Z"} {"caller":"log.go:27","count":100,"duration":0.006294438,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.144Z"}

  • prometheus-adapter-2 -> Not a leader log

{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.135Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.135Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.138Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.138Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.140Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.133Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:38.133Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.135Z"}

But when I stop the prometheus-1, prometheus-adapter-2 is not picking the leadership. Please find the below logs for adapters.

prometheus-adapter-1

{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:56.513Z"} {"caller":"log.go:27","count":93,"duration":0.005575618,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:35","level":"warn","msg":"Prometheus timeout exceeded","timeout":"7s","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:35","level":"warn","msg":"Scheduled election is paused. Instance is removed from election pool.","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:31","level":"info","msg":"Instance is no longer a leader","ts":"2020-03-09T10:30:06.962Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:10.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:15.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:20.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:25.958Z"}

prometheus-adapter-2

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:30:55.047Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.042Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.046Z"}

But when I stop the prometheus-adapter-1 then prometheus-adapter-2 is picking the leadership.

Another interesting thing is when I again start the promethus-1 then I see "Election id 2: Instance is not a leader. Can't write data" in prometheus-adapter-1 log. Please see the below log.

{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.566Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":93,"ts":"2020-03-09T10:33:34.571Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.578Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.579Z"} {"caller":"log.go:31","level":"info","msg":"Prometheus seems alive. Resuming scheduled election.","ts":"2020-03-09T10:33:34.959Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.550Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.555Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.556Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.558Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.554Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.554Z"}

So, am I followed any wrong step while setting this. or is this bug?

Please help me to resolve this issue.

MohanSaiTeki avatar Mar 09 '20 10:03 MohanSaiTeki

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

msarm avatar Jun 16 '21 14:06 msarm

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

MohanSaiTeki avatar Jun 16 '21 16:06 MohanSaiTeki

@MohanSai1997 - Were you able to make any progress setting up the HA instance?

This project is SUNSET. Please refer README.md file

Ohh yeah, I see it. thank you!

msarm avatar Jun 17 '21 17:06 msarm

https://github.com/timescale/promscale is the project that is recommended to use.

harkishen avatar Jul 01 '21 11:07 harkishen