docker_events: when retry_limits is -1, retries fail to wait for retry_interval
Bug Report
Describe the bug
The docker_events input fails to wait for retry_interval between retries when retry_limits is -1. Retries are attempted as fast as fluent-bit can go.
To Reproduce
fluent-bit.conf
[SERVICE]
log_level info
flush 1
daemon off
[INPUT]
name docker_events
tag dockerd
unix_path /run/docker.sock
reconnect.retry_limits -1
reconnect.retry_interval 1
[OUTPUT]
name stdout
match *
Logs:
Aug 29 22:17:27 test1 fluent-bit[88337]: [2024/08/29 22:17:26] [ info] [input:docker_events:docker_events.0] EOF detected. Re-initialize
Aug 29 22:17:27 test1 fluent-bit[88337]: [2024/08/29 22:17:26] [ info] [input:docker_events:docker_events.0] EOF detected. Re-initialize
There are 100000+ of these per second (fluent-bit is very fast 😄 ). These logs came from systemd (via journalctl -fu fluent-bit.service).
Steps to reproduce the problem:
With both fluent-bit and docker already running, restart docker: systemctl restart docker.
Expected behavior
Retries should be spaced out according to reconnect.retry_interval.
It appears the code presently bypasses create_reconnect_event() when retry_limits <= 0. My initial guess is that instead of bypassing the reconnect loop/event, that loop might always be needed and should just never give up retrying.
Your Environment
Fluent v3.1.6, via the official .deb package, running via the default systemd service definition. Docker 27.2.0 via their official .deb package, also running via systemd.
- Version used: 3.1.6
- Configuration: see above
- Environment name and version: n/a
- Server type and version: VM
- Operating System and version: Debian 12
- Filters and plugins: see config
Additional context
My understanding is that reconnect.retry_limits -1 is a valid way to say 'retry for an unlimited number of retries'. My goal is to retry at a measured pace, but as long as necessary to reconnect. I never want docker_events to just give up.
I am unsure if the rapid loop is somehow partly related to systemd's handling of /run/docker.sock. However, setting retry_limits to a positive integer results in a very different log:
Aug 29 22:23:56 test1 fluent-bit[88651]: [2024/08/29 22:23:56] [ info] [input:docker_events:docker_events.0] EOF detected. Re-initialize
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [error] [/tmp/fluent-bit/plugins/in_docker_events/docker_events.c:69 errno=104] Connection reset by peer
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [ info] [input:docker_events:docker_events.0] Reconnect successful
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [ info] [input:docker_events:docker_events.0] EOF detected. Re-initialize
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [error] [/tmp/fluent-bit/plugins/in_docker_events/docker_events.c:57 errno=111] Connection refused
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [error] [input:docker_events:docker_events.0] failed to re-initialize socket
Aug 29 22:24:02 test1 fluent-bit[88651]: [2024/08/29 22:24:02] [ info] [input:docker_events:docker_events.0] create reconnect event. interval=1 second
Aug 29 22:24:03 test1 fluent-bit[88651]: [2024/08/29 22:24:03] [ info] [input:docker_events:docker_events.0] Retry(1/5)
Aug 29 22:24:03 test1 fluent-bit[88651]: [2024/08/29 22:24:03] [error] [/tmp/fluent-bit/plugins/in_docker_events/docker_events.c:57 errno=111] Connection refused
Aug 29 22:24:03 test1 fluent-bit[88651]: [2024/08/29 22:24:03] [error] [input:docker_events:docker_events.0] failed to re-initialize socket
Aug 29 22:24:03 test1 fluent-bit[88651]: [2024/08/29 22:24:03] [ info] [input:docker_events:docker_events.0] Failed. Waiting for next retry..
[..continues for specified number of retries..]
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
bump
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
bump
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.