netdata icon indicating copy to clipboard operation
netdata copied to clipboard

Assorted cleanup in the OpenRC init script.

Open Ferroin opened this issue 3 years ago • 11 comments

Summary
  • Get rid of NETDATA_START_AFTER_SERVICES. OpenRC already provides a robust configuration system for managing service dependencies, including overrides. The variable itself thus provides no real benefit but makes the script somewhat harder to read, so this PR removes it and just sets the default after list to what we had been using as the default value of the variable.
  • Add less ambiguous and more accurate descriptions for the reload, rotate, and save commands. This way they concretely describe exactly what they are doing using standard terminology typically seen in other OpenRC scripts.
  • Add the ability to use netdatacli for the above mentioned commands instead of start-stop-daemon. This simplifies things for users who want to run the agent under OpenRC’s native process supervision mechanism (supervise-daemon), which does not allow use of start-stop-daemon for such functionality. This is controlled by a new variable called NETDATA_USE_NETDATACLI variable, which, if set to 1, uses netdatacli. The variable defaults to a value of 0, preserving the existing behavior as the default behavior.
Test Plan

Basic testing involves installing from this PR branch on a system that uses OpenRC (such as Alpine Linux or a default Gentoo install) and verifying that the script still works. Note that this cannot be done properly in a Docker container.

Additional Information

This fixes three of the four most glaring issues with our OpenRC init script. The final issue is our need net dependency, which is generally considered a bad thing. Properly resolving that is far more complicated though and will be handled in a separate PR.

For users: How does this change affect me?
  • NETDATA_START_AFTER_SERVICES is no longer supported. Instead use `rc_after` in `/etc/conf.d/netdata` like you would for any other service.
  • NETDATA_USE_NETDATACLI can now be set to a value of 1 to avoid using start-stop-daemon, simplifying usage of Netdata with OpenRC’s native process supervision.

Ferroin avatar Jun 11 '22 17:06 Ferroin

Clarification needed. Should netdata-agent be built from source (this branch) on Alpine host?

iigorkarpov avatar Jun 13 '22 10:06 iigorkarpov

Clarification needed. Should netdata-agent be built from source (this branch) on Alpine host?

That, or build a static install archive from this branch and install using that on the Alpine host.

Ferroin avatar Jun 13 '22 11:06 Ferroin

Is this normal?

 --- Installing (but not enabling) the netdata updater tool ---
./netdata-installer.sh: line 1862: cleanup_old_netdata_updater: not found
 FAILED  Cannot cleanup old netdata updater tool.

iigorkarpov avatar Jun 13 '22 14:06 iigorkarpov

Is this normal?

 --- Installing (but not enabling) the netdata updater tool ---
./netdata-installer.sh: line 1862: cleanup_old_netdata_updater: not found
 FAILED  Cannot cleanup old netdata updater tool.

Nope, that’s an unrelated bug, I’ll try and have a PR up to fix it by the end of the day today.

Ferroin avatar Jun 13 '22 15:06 Ferroin

Ok then.

What exactly is supposed by 'the script still works'?

iigorkarpov avatar Jun 14 '22 13:06 iigorkarpov

Ok then.

What exactly is supposed by 'the script still works'?

Running still starts and stops the agent correctly.

Ideally we want to also make sure that the reload, rotate, and save commands still work as well.

Ferroin avatar Jun 14 '22 17:06 Ferroin

On the whole, it works on Alpine 3. But:

netdata-openrc is not copied to /etc/init.d during install; I had to copy it manually In case you stops the agent with netdatacli, its status is 'crashed':

alpine:~/netdata# /usr/sbin/netdatacli shutdown-agent
alpine:~/netdata# service netdata status
 * status: crashed

But it can be cleanly stopped using service command.

Also, the only log file is the /var/log/netdata/error.log

iigorkarpov avatar Jun 15 '22 09:06 iigorkarpov

On the whole, it works on Alpine 3. But:

netdata-openrc is not copied to /etc/init.d during install; I had to copy it manually

This is unexpected, but unrelated to this PR.

In case you stops the agent with netdatacli, its status is 'crashed':

alpine:~/netdata# /usr/sbin/netdatacli shutdown-agent
alpine:~/netdata# service netdata status
 * status: crashed

This is expected behavior for OpenRC. It has no way to know what caused the service to exit, so any exit not mediated by OpenRC itself is listed as a crash.

But it can be cleanly stopped using service command.

Also expected behavior with OpenRC, if it can’t find the service to kill it when asked to shut down, it assumes it’s already down and updates state accordingly.

Also, the only log file is the /var/log/netdata/error.log

I think that is expected behavior from the agent itself.

Ferroin avatar Jun 15 '22 11:06 Ferroin

Hey.

As @iigorkarpov , I noticed:

 --- Install netdata at system init --- 
0
0
0
 WARNING  Could not determine what type of init script to install on this system.

 FAILED  Cannot install netdata init service.

But furthermore, I noticed I don't have a netdata-openrc file in system/ (just netdata-openrc.in).

Edit: This is with kickstart installing a pre-compiled binary in /opt/. Will try compiling.

MrZammler avatar Jun 15 '22 13:06 MrZammler

It doesn't seem to produce a "compiled" netdata-openrc file, even from building sources...

MrZammler avatar Jun 15 '22 14:06 MrZammler

It doesn't seem to produce a "compiled" netdata-openrc file, even from building sources...

Indeed, we appear to be having issues relating to this aside from the cleanup code here. I’ll start digging into this further and try to get that fixed, but it should probably be a separate PR.

Ferroin avatar Jun 15 '22 15:06 Ferroin

Rebased to pick up CI changes.

Also added further improvements based on local experimentation with the script.

Ferroin avatar Sep 30 '22 13:09 Ferroin

Rebased to pick up latest changes to system service install handling, as well as latest CI changes.

Ferroin avatar Oct 17 '22 13:10 Ferroin

Was originally going to merge this after the v1.37.0 release, but given that that’s been delayed by a week, merging it now so it’s in the release.

Ferroin avatar Oct 25 '22 11:10 Ferroin