docker-systemctl-replacement icon indicating copy to clipboard operation
docker-systemctl-replacement copied to clipboard

Misleading error message interaction between ExecStart= and ExecStartPost=

Open PenelopeFudd opened this issue 2 years ago • 3 comments

Hi;

We have an Ansible deployment script that installs this service file:

[Unit]
Description=rabbitmq-server - RabbitMQ broker
After=network.target [email protected]
Wants=network.target [email protected]

[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
UMask=0027
NotifyAccess=all
TimeoutStartSec=3600
LimitNOFILE=32768
Restart=on-failure
RestartSec=10
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/lib/rabbitmq/bin/rabbitmq-server
ExecStartPost=-+/home/application/bin/python3 /usr/local/bin/rabbitmq_detect_msg_store_corruption.py
ExecStop=/usr/lib/rabbitmq/bin/rabbitmqctl shutdown
SuccessExitStatus=69

[Install]
WantedBy=multi-user.target

When we start the service, we get this:

$ sudo systemctl start service rabbitmq-server

Unable to start service rabbitmq-server: ERROR:systemctl: rabbitmq-server.service: Exec command does not exist: (ExecStartPost) /home/application/backend/bin/python3

$ echo $?
1

The error message turned out to be a red herring. Neither the Ansible script nor the service file has been changed in over a year, and the error message has apparently been printed all this time without returning an error code.

The true error turns out to be that we changed a password in rabbitmq's configuration file, and we failed to url-escape it. When ExecStart runs, the server writes an error to a random log file and exits with a non-zero return code.

It would be nice if systemctl had printed

Unable to start service rabbitmq-server: ERROR:systemctl: rabbitmq-server.service: ExecStart command exited with an ExitStatus of 1: (ExecStart) /usr/lib/rabbitmq/bin/rabbitmq-server

Thanks!

PenelopeFudd avatar Jun 22 '23 19:06 PenelopeFudd

Sadly this is impossible as the docker-systemctl-replacement is not a server that can watch its children. It can not see the returncode of the ExecStart process - it will only detect a "failed" service when that Pid has vanished.

The other thing about supporting "-+" prefix is a different thing however. Currently "+" for "nouser" is ignored, so when python3 is not accessible by user rabbitmq then it fails. This may change in the future.

gdraheim avatar Jul 28 '23 20:07 gdraheim

Ok, good to know.
I had been under the impression that it could see the return value of the exec() call if it exited immediately (daemonized, for instance), just not if the exec() call kept running.

PenelopeFudd avatar Jul 30 '23 18:07 PenelopeFudd

In this case, is it trying to exec() the program +/home/application/bin/python3 and failing? Wouldn't it be possible to say Path '%s' is not absolute, will not exec(), or if relative paths are allowed, then just Pathname '%s' not found, will not exec()?
That would be helpful whether or not + for nouser is implemented.

PenelopeFudd avatar Jul 30 '23 20:07 PenelopeFudd