fluentd Update/Reload without downtime

Which issue(s) this PR fixes:

What this PR does / why we need it: See #4622.

Specification:

The supervisor receives SIGUSR2.
Spawn a new supervisor.
Take over shared sockets.
Launch new workers, and stop old processes in parallel.
- Launch new workers with source-only mode
  - Limit to restart_without_downtime_ready? input plugin
- Send SIGTERM to the old supervisor after 10s delay from 3.
The old supervisor stops and sends SIGRTMIN(34) to the new one.
The new workers run fully.

Screenshot from 2024-10-11 09-38-28

Supported input plugins:

Needs following:

Docs Changes: TODO

Release Note: TODO

TODO:

Aug 30 '24 07:08 daipom

The basic implementation is done. Some concept of #4654 is reflected. Thanks @Watson1978!

Oct 11 '24 01:10 daipom

Thanks for your review!

Nov 27 '24 02:11 daipom

during zeroDowntimeRetart, other HTTP endpoints result in non-guarded state. it it intentional?

Nov 27 '24 02:11 kenhys

during zeroDowntimeRetart, other HTTP endpoints result in non-guarded state. it it intentional?

Yes. The old Fluentd should continue to work as is until it receives SIGTERM at 4.. (Even if the new Fluentd does not work as expected).

The new Fluentd RPC starts at 5., so there is no conflict.

If the old Fluentd receives /api/processes.killWorkers, it causes just a quick transition to 5..

Nov 27 '24 03:11 daipom

Thanks for your review!

Nov 28 '24 04:11 daipom