Unable to configure High Availability of Prometheus with timescaleDB
I am trying to set up the High Availability of Prometheus using timescaleDB with below configurations.
Node exporter
docker run -d -p 9100:9100 quay.io/prometheus/node-exporter
Prometheus
- prometheus-1
docker run -it -p 9090:9090 -v /root/prometheus/prometheus1.yml:/etc/prometheus/prometheus.yml prom/prometheus - prometheus1.yml
global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9201/write" remote_read: url: "http://10.128.15.221:9201/read" read_recent: true
- prometheus-2
docker run -it -p 9091:9090 -v /root/prometheus/prometheus2.yml:/etc/prometheus/prometheus.yml prom/prometheus - prometheus2.yml
global: scrape_interval: 5s evaluation_interval: 10s scrape_configs: job_name: prometheus static_configs: targets: ['10.128.15.221:9100'] remote_write: url: "http://10.128.15.221:9202/write" remote_read: url: "http://10.128.15.221:9202/read" read_recent: true
Prometheus adapter
-
prometheus-adapter-1
docker run -it -p 9201:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s -
prometheus-adapter-2
docker run -it -p 9202:9201 timescale/prometheus-postgresql-adapter:latest -pg-host=10.128.15.221 -pg-password=secret -leader-election-pg-advisory-lock-id=2 -leader-election-pg-advisory-lock-prometheus-timeout=7s
pg_prometheus
docker run --name pg_prometheus -e POSTGRES_PASSWORD=secret -it -p 5432:5432 timescale/pg_prometheus:latest-pg11 postgres -csynchronous_commit=off
When I spin up, everything is working fine with the below status.
- prometheus-adapter-1 -> Leader log
{"caller":"log.go:27","count":100,"duration":0.007022144,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:01.146Z"} {"caller":"log.go:27","count":100,"duration":0.007113201,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.119Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.119Z"} {"caller":"log.go:27","count":100,"duration":0.006514815,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.128Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":200,"ts":"2020-03-09T10:02:06.128Z"} {"caller":"log.go:27","count":100,"duration":0.00611504,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.136Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:02:06.136Z"} {"caller":"log.go:27","count":100,"duration":0.006294438,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:02:06.144Z"}
- prometheus-adapter-2 -> Not a leader log
{"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.135Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.135Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.138Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:33.138Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:33.140Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.133Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:01:38.133Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 1: Instance is not a leader. Can't write data","ts":"2020-03-09T10:01:38.135Z"}
But when I stop the prometheus-1, prometheus-adapter-2 is not picking the leadership. Please find the below logs for adapters.
prometheus-adapter-1
{"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:56.513Z"} {"caller":"log.go:27","count":93,"duration":0.005575618,"level":"debug","msg":"Wrote samples","ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-03-09T10:29:59.668Z"} {"caller":"log.go:35","level":"warn","msg":"Prometheus timeout exceeded","timeout":"7s","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:35","level":"warn","msg":"Scheduled election is paused. Instance is removed from election pool.","ts":"2020-03-09T10:30:06.960Z"} {"caller":"log.go:31","level":"info","msg":"Instance is no longer a leader","ts":"2020-03-09T10:30:06.962Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:10.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:15.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:20.958Z"} {"caller":"log.go:27","level":"debug","msg":"Scheduled election is paused. Instance can't become a leader until scheduled election is resumed (Prometheus comes up again)","ts":"2020-03-09T10:30:25.958Z"}
prometheus-adapter-2
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:30:55.047Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:30:55.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:00.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:00.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.041Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:05.046Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:05.048Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.041Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.042Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.043Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.044Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:31:10.045Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:31:10.046Z"}
But when I stop the prometheus-adapter-1 then prometheus-adapter-2 is picking the leadership.
Another interesting thing is when I again start the promethus-1 then I see "Election id 2: Instance is not a leader. Can't write data" in prometheus-adapter-1 log. Please see the below log.
{"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.566Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":93,"ts":"2020-03-09T10:33:34.571Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.576Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:34.578Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:34.579Z"} {"caller":"log.go:31","level":"info","msg":"Prometheus seems alive. Resuming scheduled election.","ts":"2020-03-09T10:33:34.959Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.550Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.553Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.555Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:39.556Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:39.558Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.551Z"} {"caller":"log.go:27","level":"debug","msg":"Election id 2: Instance is not a leader. Can't write data","ts":"2020-03-09T10:33:44.554Z"} {"caller":"log.go:31","level":"info","msg":"Samples write throughput","samples/sec":0,"ts":"2020-03-09T10:33:44.554Z"}
So, am I followed any wrong step while setting this. or is this bug?
Please help me to resolve this issue.
@MohanSai1997 - Were you able to make any progress setting up the HA instance?
@MohanSai1997 - Were you able to make any progress setting up the HA instance?
This project is SUNSET. Please refer README.md file
@MohanSai1997 - Were you able to make any progress setting up the HA instance?
This project is SUNSET. Please refer README.md file
Ohh yeah, I see it. thank you!
https://github.com/timescale/promscale is the project that is recommended to use.