FTL icon indicating copy to clipboard operation
FTL copied to clipboard

Add supervised mode

Open DL6ER opened this issue 3 years ago • 9 comments

By submitting this pull request, I confirm the following:

  • [X] I have read and understood the contributors guide.
  • [X] I have checked that another pull request for this purpose does not exist.
  • [X] I have considered, and confirmed that this submission will be valuable to others.
  • [X] I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • [X] I give this submission freely, and claim no ownership to its content.

How familiar are you with the codebase?:

10


Add supervision mode. FTL running as supervisor starts a child process which is automatically restarted in case of any abnormal termination including crashes and uncatchable signals such as SIGKILL.

This is also the first step into the direction of supporting updates via the web frontend as this allows FTL to restart itself without having to rely on a third-party.

DL6ER avatar Jul 06 '22 19:07 DL6ER

Great idea!

dschaper avatar Jul 06 '22 19:07 dschaper

What are the child processes (components) the supervisor can start individually? Can it e.g. restart the webserver without restarting dnsmasq?

yubiuser avatar Jul 06 '22 19:07 yubiuser

@yubiuser It is one (full) FTL process that acts as a supervisor starting another (full) instance of FTL. It is neither planned nor practical doable to start individual parts within FTL and launch them as individual components. This PR implements something like the restart-on-failure of systemd (with a bit of more fine-grained control).

DL6ER avatar Jul 07 '22 21:07 DL6ER

Moving the termination into the tests (to check if the supervisor behaves as expected) revealed a crash happening only on musl. On our "normal" builds, canceling a non-existing thread is a simple no-op whereas, on musl, this leads to a segmentation fault. The fix is to avoid canceling non-existing threads (3c38420/src/daemon.c#R234-R237).

DL6ER avatar Jul 09 '22 13:07 DL6ER

Is there a mechanism which prevents an endless restart loop? I cannot see it in code currently. Probably a restart limit, optionally adjustable with a CLI option, makes sense?

MichaIng avatar Aug 04 '22 15:08 MichaIng

There is a forced delay of 1sec between restarts to avoid CPU spinning by endless restarts in a very short time. Others than that, there is no counter of recently failed attempts. Is there such a limit with systemd and it's Restart=on-failure feature? Even when it was easy, I would still like to avoid reinventing the wheel and use a proper systemd unit with said Restart=on-failure instead of this PR.

DL6ER avatar Aug 04 '22 19:08 DL6ER

https://www.freedesktop.org/software/systemd/man/systemd.service.html#Restart=

My recollection is that systemd will only try 5 times by default and then journal an error that says "retried too many times" or "tried to restart too quickly". There are limits that are discussed above as well as alternatives to 'on-failure'.

dschaper avatar Aug 04 '22 21:08 dschaper

The related (re)start limits can be set in the [Unit] section: https://www.freedesktop.org/software/systemd/man/systemd.unit.html#StartLimitIntervalSec=interval

I see, I need to start doing the systemd integration 😅. I'm sorry for not doing so far what I promised, new full-time job has reduced my space time dramatically.

MichaIng avatar Aug 06 '22 16:08 MichaIng

I plan to look at another systemd attempt sometime next week. Maybe the busy two of us manage to do it together :slightly_smiling_face:

DL6ER avatar Aug 06 '22 16:08 DL6ER

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions[bot] avatar Dec 23 '22 11:12 github-actions[bot]

Superseded by native systemd unit ( https://github.com/pi-hole/pi-hole/pull/4924 )

DL6ER avatar Dec 27 '22 17:12 DL6ER