dkron
dkron copied to clipboard
Some of the jobs stop executing after one node of five goes down
After one of five dkron nodes was gracefully stopped some of jobs stopped executing at scheduled time without any entry in dkron logs. Example: job schedule every hour with "Next 2/5/2021, 9:00:00 AM" not start at 9, 10, or 11. After restarting every instance job started executing again.
OS: Ubuntu 18.04.5 & Oracle Linux 8.3 Dkron version: 3.1.4
Example dkron.yaml server: true data-dir: "/var/lib/dkron" datacenter: "XXX" bind-addr: "10.1.1.24:8946" http-addr: "10.1.1.24:8084" serf-reconnect-timeout: "12h" retry-interval: "10s" retry-join:
- 10.1.1.23
- 10.1.1.24
- 10.1.1.25
- 10.1.1.14
- 10.1.1.12 enable-prometheus: true raft-multiplier: 1 log-level: warn tags: location: public mail-host: 127.0.0.1 mail-port: 25 mail-from: XXX
Logs from leader at node shutdown time="2021-02-05T08:33:10+01:00" level=warning msg="non-server in gossip pool" member=s2 node=x5
From node that was stopped: Caught signal: terminatedtime="2021-02-05T08:30:14+01:00" level=info msg="agent: Gracefully shutting down agent..." time="2021-02-05T08:30:14+01:00" level=warning msg="plugin failed to exit gracefully" node=s2 time="2021-02-05T08:30:14+01:00" level=warning msg="error closing client during Kill" err="unexpected EOF" node=s2 time="2021-02-05T08:30:14+01:00" level=warning msg="plugin failed to exit gracefully" node=s2 time="2021-02-05T08:30:14+01:00" level=info msg="No jobs left. Exiting." time="2021-02-05T08:30:14+01:00" level=warning msg="error closing client during Kill" err="unexpected EOF" node=s2 time="2021-02-05T08:30:14+01:00" level=warning msg="plugin failed to exit gracefully" node=s2 time="2021-02-05T08:30:14+01:00" level=warning msg="error closing client during Kill" err="unexpected EOF" node=s2 time="2021-02-05T08:30:14+01:00" level=info msg="Waiting for jobs to finish..."