fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

v1.17.1 sometimes kills in_monitor_http_server_helper thread

Open johnmanjiro13 opened this issue 1 year ago • 4 comments

Describe the bug

We use a docker image (v1.17-debian) of fluentd with fluent-logger-golang. Since the release of 1.17.1, it now kills threads and outputs WARN logs as shown below.

To Reproduce

I haven't found a way to reproduce this yet. Here is our Dockerfile.

FROM fluent/fluentd:v1.17-debian

USER root
RUN apt-get update && \
    apt-get -y install wget && \
    apt-get clean && rm -rf /var/lib/apt/lists/*

Expected behavior

fluentd doesn't output WARN logs

Your Environment

- Fluentd version: 1.17.1
- Package version:
- Operating system: Linux
- Kernel version:

Your Configuration

<system>
  log_level info
  <log>
    format json
  </log>
</system>

<label @FLUENT_LOG>
  <match fluent.*>
    @type null
  </match>
</label>

<source>
  @type monitor_agent
  bind 0.0.0.0
  port 24220
</source>

<source>
  @type forward
  @id forward_input
</source>

<source>
  @type http
  port 9880
  bind 0.0.0.0
</source>

<match fluentd.pod.**>
  @type null
</match>


<match log.**>
  @type forward
  expire_dns_cache 0
  dns_round_robin true
  heartbeat_type transport
  require_ack_response true

  <buffer>
    @type file
    path /mnt/fluentd/forward
    flush_at_shutdown true
    flush_interval 0.1
  </buffer>

  <server>
    name destination
    host destination
  </server>
</match>

Your Error Log

killing existing thread thread=#<Thread:0x00007f66153e9b80@in_monitor_http_server_helper /usr/local/bundle/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:70 sleep>

thread doesn't exit correctly (killed or other reason) plugin=Fluent::Plugin::MonitorAgentInput title=:in_monitor_http_server_helper thread=#<Thread:0x00007f66153e9b80@in_monitor_http_server_helper /usr/local/bundle/gems/fluentd-1.17.1/lib/fluent/plugin_helper/thread.rb:70 aborting> error=nil

Additional context

No response

johnmanjiro13 avatar Aug 20 '24 08:08 johnmanjiro13

Since the release of 1.17.1, it now kills threads and outputs WARN logs as shown below.

Did it work expected on v1.17.0? If not, could you tell us exact version that works as expected?

ashie avatar Aug 20 '24 12:08 ashie

Did it work expected on v1.17.0?

Yes. I changed the base image to v1.17.0-debian1.1, it works fine.

johnmanjiro13 avatar Aug 20 '24 15:08 johnmanjiro13

In point view of docker image, from v1.17.0-debian-1.1 to v1.17.0-debian-1.1 very limited changes are applied. (No apparent issues in Dockerfile as far as I know) so we can focus on changes about v1.17.1 and it's dependency.

 diff -u  gems-v1.17.0-debian-1.1.txt gems-v1.17.1-debian-1.0.txt 
--- gems-v1.17.0-debian-1.1.txt 2024-08-23 16:06:15.979508868 +0900
+++ gems-v1.17.1-debian-1.0.txt 2024-08-23 16:06:51.543578402 +0900
@@ -32,7 +32,7 @@
 fiddle (default: 1.1.1)
 fileutils (default: 1.7.0)
 find (default: 0.1.1)
-fluentd (1.17.0)
+fluentd (1.17.1)
 forwardable (default: 1.3.3)
 getoptlong (default: 0.2.0)
 http_parser.rb (0.8.0)
@@ -42,7 +42,7 @@
 ipaddr (default: 1.2.5)
 irb (default: 1.6.2)
 json (2.7.2, default: 2.6.3)
-logger (default: 1.5.3)
+logger (1.6.0, default: 1.5.3)
 matrix (0.4.2)
 minitest (5.16.3)
 msgpack (1.7.2)

About http server helper, it only fixes frozen error. :thinking: https://github.com/fluent/fluentd/commit/0e2e9854f73d8fed8010bc35d115431e5eaee4cb

Hmm, as it seems that it might occurs during startup monitor plugin, but not reproduced yet... (e.g. launch fluentd and throw REST access against monitor endpoint )

kenhys avatar Aug 23 '24 07:08 kenhys

async related gems are same versions.

v1.17.0-debian-1.1

fluent@b49bcca3f5d1:/$ gem list | grep async
async (1.32.1)
async-http (0.64.2)
async-io (1.43.2)
async-pool (0.8.0)
fluent@b49bcca3f5d1:/$ gem list | grep console
console (1.27.0)
io-console (default: 0.6.0)

v1.17.1-debian

fluent@51b160f178b1:/$ gem list | grep async
async (1.32.1)
async-http (0.64.2)
async-io (1.43.2)
async-pool (0.8.0)
fluent@51b160f178b1:/$ gem list | grep console
console (1.27.0)
io-console (default: 0.6.0)

ashie avatar Aug 27 '24 03:08 ashie

I can't reproduce this. We need more info to reproduce this issue.

daipom avatar Sep 09 '24 01:09 daipom

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

github-actions[bot] avatar Oct 09 '24 10:10 github-actions[bot]

This issue was automatically closed because of stale in 7 days

github-actions[bot] avatar Oct 17 '24 10:10 github-actions[bot]