amonagent icon indicating copy to clipboard operation
amonagent copied to clipboard

Systemd - Killed by SIGPIPE

Open imerr opened this issue 8 years ago • 0 comments

We've been having some issues running amonagent on a Debian Jessie machine It keeps getting killed by SIGPIPE

# systemctl status amonagent -l
● amonagent.service - Starts and stops amonagent
   Loaded: loaded (/lib/systemd/system/amonagent.service; enabled)
   Active: inactive (dead) since Sun 2017-04-02 04:47:45 WEST; 1 day 3h ago
     Docs: https://www.amon.cx/docs
 Main PID: 14402 (code=killed, signal=PIPE)

Apr 03 08:39:58 amonagent[25552]: time="2017-04-03T08:39:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:39:59 amonagent[25552]: time="2017-04-03T08:39:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:40:58 amonagent[25552]: time="2017-04-03T08:40:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:40:59 amonagent[25552]: time="2017-04-03T08:40:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:41:58 amonagent[25552]: time="2017-04-03T08:41:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:41:59 amonagent[25552]: time="2017-04-03T08:41:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:42:58 amonagent[25552]: time="2017-04-03T08:42:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:42:59 amonagent[25552]: time="2017-04-03T08:42:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:43:58 amonagent[25552]: time="2017-04-03T08:43:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:43:59 amonagent[25552]: time="2017-04-03T08:43:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"

It seems to be related to journald restarting (which drops the stdout pipe i guess), and since amon doesnt handle SIGPIPE is gets killed. The default systemd config specifies Restart=on-failure - systemd apparently doesnt consider this a failure.

The best way to handle this would probably be to handle SIGPIPE and exit gracefully with a non-zero exit code Another option would be specifying Restart=always in the systemd file (realisticly - you want the amon agent running always, right?)

(The systemd restarts seem to be caused by some sort of "hardware" (qemu, so virtual) issue - so it won't be super common, but it's probably good to handle this better)

imerr avatar Apr 03 '17 07:04 imerr