pcp icon indicating copy to clipboard operation
pcp copied to clipboard

systemd notify is implemented only for the primary pmlogger

Open andreasgerstmayr opened this issue 4 years ago • 3 comments

While investigating why the pmlogger service doesn't start ("protocol failure") when there is no local pmcd service running, I noticed that pmlogger_check.sh only notifies systemd when the primary pmlogger is started: https://github.com/performancecopilot/pcp/blob/2ce509bf4ffe9a83be08f5f6fbac9b2d8f6b97a3/src/pmlogger/pmlogger_check.sh#L1013-L1015

This is a problem if I don't have a primary pmlogger (which can be the case for the PCP container, but could also happen with a regular bare-metal installation). afaics the pmlogger service as a whole should be marked as "running" if and only if all configured pmloggers started. OTOH, that means we need multiple systemd units (templates), which brings us back to #896.

"notify the first pmlogger" as suggested in the comment works as a workaround, but then systemctl status pmlogger just shows the status of the first pmlogger :|

andreasgerstmayr avatar Feb 22 '21 20:02 andreasgerstmayr

Same with pmie in https://github.com/performancecopilot/pcp/blob/d0c53a0f9dd574cd23e22e721329e7e2d680b690/src/pmie/pmie_check.sh#L761-L766

andreasgerstmayr avatar Mar 02 '21 19:03 andreasgerstmayr

Notifying systemd as soon as the primary has started was intentional - we can't (and don't need to) wait for the whole farm to start during boot. The TODO for the case were there is no primary was never implemented because only the primary is under systemd's control - any remote(s) are supposed to be managed by pmlogctl, but still run in the same cgroup as the primary and so systemctl status pmlogger reports them all. Basically it was never intended to run without a primary I guess ..

For containers with only one pmlogger for a remote pmcd, perhaps that could be the primary? (would need control file tweaks and probably code changes). Alternatively, the container could just run the primary anyway, with a very limited config and not expose any ports outside the container.

goodwinos avatar Mar 02 '21 23:03 goodwinos

Notifying systemd as soon as the primary has started was intentional - we can't (and don't need to) wait for the whole farm to start during boot.

If nothing depends on pmlogger, nothing will be blocked if pmlogger takes longer to start, because systemd starts services in parallel. It will take longer to reach the multi-user.target, but if nothing depends on it, nothing will be delayed (see [1]).

The TODO for the case were there is no primary was never implemented because only the primary is under systemd's control - any remote(s) are supposed to be managed by pmlogctl, but still run in the same cgroup as the primary and so systemctl status pmlogger reports them all. Basically it was never intended to run without a primary I guess ..

systemctl status reports them, but does it actually monitor them? Afaics it only monitors the process with the PID stored in /run/pcp/pmlogger.pid, which is the primary one.

For containers with only one pmlogger for a remote pmcd, perhaps that could be the primary? (would need control file tweaks and probably code changes). Alternatively, the container could just run the primary anyway, with a very limited config and not expose any ports outside the container.

+1 That's the workaround for now - either specifying the remote pmcd as primary or just explicitly adding -N to the pmlogger args, to notify systemd. But I don't think this makes a good user experience :| Do we need the distinction between primary vs non-primary loggers, or would a pmlogger@localhost service also suffice?

[1] https://unix.stackexchange.com/questions/178920/how-to-enable-systemds-service-without-waiting

andreasgerstmayr avatar Mar 03 '21 18:03 andreasgerstmayr