fluentd icon indicating copy to clipboard operation
fluentd copied to clipboard

Fluentd old buffer file cannot flush to elasticsearch

Open luckyzzr opened this issue 7 months ago • 2 comments

Describe the bug

I run fluentd for a long time,input plugin config forward receive event, output pugin config elasticsearch

Recently I found there aure many residual old buffer file and cannot flush to elasticseach even if I restart fluentd

To Reproduce

run fluentd for a long time, and buffer output rate is not large than elasticsearch cluster write rate

Expected behavior

old buffer file that create at 2024 year both flush to elasticsearch normally

Your Environment

- Fluentd version: 1.16.8
- Package version: docker
- Operating system: Debian GNU/Linux 12 (bookworm)
- Kernel version: 5.4.61-050461-generic

Your Configuration

<label @FLUENT_LOG>
    <match fluent.*>
      @type null
    </match>
  </label>
  <source>
    @type forward
    port 24224
    skip_invalid_event true
  </source>
  <match **>
    @type elasticsearch_dynamic
    @id elasticsearch
    host "elasticsearch"
    ssl_verify false
    validate_client_version true
    reconnect_on_error true
    index_name "${tag_parts[-1]}"
    reload_connections false
    reload_on_failure true
    time_key "timestamp"
    time_key_exclude_timestamp true
    utc_index false
    slow_flush_log_threshold 120.0
    request_timeout 120s
    bulk_message_request_threshold -1
    suppress_type_name true
    default_elasticsearch_version 7
    <buffer>
      @type "file_single"
      path "/log/buffer/elasticsearch"
      chunk_format text
      total_limit_size 9G
      chunk_limit_size 15M
      retry_type periodic
      retry_wait 60s
      flush_mode interval
      flush_interval 15s
      flush_thread_count 4
      retry_forever true
      overflow_action block
    </buffer>
  </match>
  <source>
    @type prometheus
    port 24231
    metrics_path "/prometheus"
    aggregated_metrics_path "/metrics"
  </source>
  <source>
    @type prometheus_output_monitor
  </source>
  <system>
    root_dir "/tmp/fluentd-buffers/"
    rpc_endpoint "0.0.0.0:24444"
    suppress_repeated_stacktrace true
    ignore_same_log_interval 60s
    ignore_repeated_log_interval 60s
    emit_error_log_interval 60s
  </system>

Your Error Log

2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c71e00ab834f19da9d5db50bd634b.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c7ea812c3695eb6cc161bea831269.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c7eabb2f376bbd62ff6ccfef64390.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c7ef249400acd82b6d598e606feb2.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c7ef91a6bad2fe4a11961bcd606c3.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c7efecd58a8b9a0bf75c55a257448.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c80a0ef5503ff376690732b2d5ad4.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c80a51a8796a8df795a173129407f.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c80a9184ca99991197671ec212dcd.buf
2025-05-12 09:26:19 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c84ec8fbed06daa461ba22a545b60.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c84f087dd2ab57d0208527f64555a.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c84f593d35209eabdb9e6bed40514.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c8567e96570f0671e9e54aae2b0fe.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c8572a762f5b167529536fec2b847.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c85805a39aca7d469618b57d2427a.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c85e586f14c778b36452245446937.buf
2025-05-12 09:26:20 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c85e9e2cd0faca0688dfabc35a55e.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c85f052485f8ce9c1cc8fd19ce395.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c85f302bfe8668e18663093e96f9b.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c86384b198fe7cb900a53f766ca27.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c863c535025520f5d165fedbde326.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c863f8d8e30abe7506a652f2c7791.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c87e9d8833fcaf9278a14a5ed8f92.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c87eac0bdb01332586cf6eed10e5f.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c87eafb484e72f728622dc5c04ce3.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c8807801edb18ca4855922029530f.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c880b6279b4ab31274efa7ce5b163.buf
2025-05-12 09:26:21 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c8811ca7f52996ad570bb2b3846d2.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c881c5ff3eff403c83c5c96fe8771.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c89a80af066cc9cb9f2210140543d.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c89aaa881e45ef11b2b84647678cf.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.pod.openstack.b622c89ada4fe632312b3a241ef5eb634.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.syslog.system.b634ece54e21704344a7bc57047a62b55.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] restoring buffer file: path = /log/buffer/elasticsearch/worker2/fsb.volume.audit.b634ece57b786e69735c67b06a6a0a6c6.buf
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] buffer started instance=2520 stage_size=46423238 queue_size=0
2025-05-12 09:26:22 +0000 [debug]: #2 fluent/log.rb:341:debug: listening prometheus http server on http:://0.0.0.0:24233//prometheus for worker2
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] flush_thread actually running
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] flush_thread actually running
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] flush_thread actually running
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] flush_thread actually running
2025-05-12 09:26:22 +0000 [debug]: #2 fluent/log.rb:341:debug: Start async HTTP server listening http://0.0.0.0:24233
2025-05-12 09:26:22 +0000 [debug]: #2 [elasticsearch] enqueue_thread actually running
2025-05-12 09:26:22 +0000 [debug]: #2 fluent/log.rb:341:debug:   0.0s: Async::IO::Socket
      | Binding to #<Addrinfo: 0.0.0.0:24233 TCP>
2025-05-12 09:26:22 +0000 [info]: #2 fluent/log.rb:362:info: listening port port=24224 bind="0.0.0.0"
2025-05-12 09:26:22 +0000 [info]: #2 fluent/log.rb:362:info: fluentd worker is now running worker=2

Additional context

-rw-r--r-- 1 root root 15140776 Sep 23 2024 fsb.pod.openstack.b622c89a80af066cc9cb9f2210140543d.buf -rw-r--r-- 1 root root 15503561 Sep 23 2024 fsb.pod.openstack.b622c89aaa881e45ef11b2b84647678cf.buf -rw-r--r-- 1 root root 15679860 Sep 23 2024 fsb.pod.log.b622c8a645e8e4e8279b5f6c96b6fd8d2.buf -rw-r--r-- 1 root root 15097368 Sep 23 2024 fsb.pod.log.b622c8a6afe0c4a68f1421cbd725d23da.buf -rw-r--r-- 1 root root 15140143 Sep 23 2024 fsb.pod.log.b622c8a75e18ff5a710060ac859220fc7.buf -rw-r--r-- 1 root root 15043892 Sep 23 2024 fsb.pod.log.b622c8c27709c06be4b61c188b51203ff.buf -rw-r--r-- 1 root root 15142960 Sep 23 2024 fsb.pod.log.b622c8c2aba3e3b0156fe869a3ca1b34f.buf -rw-r--r-- 1 root root 15001516 Sep 23 2024 fsb.pod.log.b622c8c2e83a15ae83de4e8265b20d8ea.buf -rw-r--r-- 1 root root 14945271 Sep 23 2024 fsb.pod.log.b622c8c7827dd8b1aad3dd9d13034334e.buf -rw-r--r-- 1 root root 14993627 Sep 23 2024 fsb.pod.log.b622c8c7924e717059032fe5f604328d5.buf -rw-r--r-- 1 root root 14961563 Sep 23 2024 fsb.pod.log.b622c8c7b1c9e14ebc5b1b4e781ee9e48.buf -rw-r--r-- 1 root root 15029793 Sep 23 2024 fsb.pod.log.b622c8c7c4a55ef03876af1a7011b8eae.buf -rw-r--r-- 1 root root 15027225 Sep 23 2024 fsb.pod.log.b622c8c7e52a7a0fb03731cc72a94d2cc.buf -rw-r--r-- 1 root root 15400896 Sep 23 2024 fsb.pod.log.b622c8d01b999f82a2fc6b4f1415b4821.buf -rw-r--r-- 1 root root 15427766 Sep 23 2024 fsb.pod.log.b622c8d07d331a17c8b7866b4440aeeb2.buf -rw-r--r-- 1 root root 15319755 Sep 23 2024 fsb.pod.log.b622c8d0b05891029d86404a103e503f6.buf -rw-r--r-- 1 root root 15384163 Sep 23 2024 fsb.pod.log.b622c8d0e1e937b7eddc4a3609069dbeb.buf -rw-r--r-- 1 root root 15377167 Sep 23 2024 fsb.pod.log.b622c8d1205ad018018c335779963dbec.buf -rw-r--r-- 1 root root 3552 May 12 09:34 fsb.syslog.system.b634ed06629e693e14cdb8272eddde615.buf -rw-r--r-- 1 root root 525598 May 12 09:34 fsb.pod.log.q634ed05d8a6727806be186b60af1e2fe.buf -rw-r--r-- 1 root root 2062 May 12 09:34 fsb.kubelet.kubernetes.b634ed0641914ab0d55a354cda45eef16.buf

luckyzzr avatar May 12 '25 09:05 luckyzzr

@luckyzzr Thanks for your report.

The cause is the existence of duplicate staged chunk files with the same key.

When using file_single buffer, the key is included in the filename. For example, the key of the following chunk file is pod.openstack.

fsb.pod.openstack.b622c89a80af066cc9cb9f2210140543d.buf

The keys of staged chunks must be unique. New events are added to the chunk for the corresponding key, or if that key does not exist, a new chunk is created. That chunk is queued after 15 seconds and flushed immediately and gone, according to the following settings:

flush_mode interval flush_interval 15s

If duplicate keys exist, Fluentd will not recognize the chunk. The chunk is not processed at all. This is why the old files continue to remain after the restart.

It's hard to imagine this happening normally. Have you moved any files manually?

You can flush these chunks as follows.

  • Add flush_at_shutdown to the buffer setting.
      <buffer>
        @type "file_single"
        ...
        flush_at_shutdown
      </buffer>
    
  • Repeat restarts until all chunk files are gone.
    • Each time it starts and stops, Fluentd will flush one chunk file per key.
    • Need flush_at_shutdown to flush at stopping for sure.

daipom avatar May 14 '25 02:05 daipom

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 7 days

github-actions[bot] avatar Jun 13 '25 10:06 github-actions[bot]

This issue was automatically closed because of stale in 7 days

github-actions[bot] avatar Jun 20 '25 10:06 github-actions[bot]