fluentd log drops when my Fluent Pod utilization is near ~1 core

Describe the bug

I am getting log drops when my Fluent Pod utilization is near ~1 core.

I'm running fluentd in Kubernetes(eks) send my application stdout/stderr logs to elastic search, getting logs to drop when ingestion rate is high and when pod CPU utilization is 1 core.

Fluentd helm charts link https://github.com/anup1384/helm-charts/tree/master/stable/fluentd-ds

And I'm unable to use multi-core fluentd, can someone help me with configurations and how to use multicore fluentd pod

To Reproduce

On high load and ingestion

Expected behavior

No logs drop

Your Environment

Fluentd version v1.14.3

ES version- 7.15.0

Plugin version - 

elasticsearch (7.15.0)

elasticsearch-api (7.15.0)

elasticsearch-transport (7.15.0)

elasticsearch-xpack (7.15.0)

fluent-plugin-elasticsearch (5.1.5, 5.1.4)

Your Configuration

<source>
    @type tail
    @id in_tail_container_logs
    path /var/log/containers/perf*.log
    pos_file /var/log/fluentd-containers.log.pos
    tag k8.*
    read_from_head true
    <parse>
      @type json
      time_key @timestamp
      time_format %Y-%m-%dT%H:%M:%S.%N%z
      keep_time_key true
    </parse>
   </source>
   <filter **>
      @type kubernetes_metadata
      skip_container_metadata "true"
    </filter>
    <filter **>
        @type parser
        @log_level info
        key_name log
        reserve_data true
        reserve_time true
        remove_key_name_field true
        emit_invalid_record_to_error false
        replace_invalid_sequence true
        <parse>
          @type json
        </parse>
      </filter>
      <filter **>
        @type record_transformer
        enable_ruby true
        <record>
          log_json ${record["log"]}
        </record>
        remove_keys $.kubernetes.labels
     </filter>
     <filter **>
        @type elasticsearch_genid
        hash_id_key _hash
      </filter>
    <match k8.**>
      @type copy
      @id k8s
      <store>
        @type elasticsearch
        @id k8s_es
        @log_level debug
        scheme http
        host  "es-acquiring-log.abc.com"
        port  "80"
        log_es_400_reason true
        logstash_format true
        logstash_prefix abc-test
        reconnect_on_error true
        reload_on_failure true
        reload_connections false
        suppress_type_name true
        sniffer_class_name Fluent::Plugin::ElasticsearchSimpleSniffer
        request_timeout 2147483648
        compression_level best_compression
        include_timestamp true
        utc_index false
        time_key_format "%Y-%m-%dT%H:%M:%S.%N%z"
        time_key time
        id_key _hash
        remove_keys _hash
       <buffer tag, abc-test>
          @type file
          flush_mode interval
          flush_thread_count 8
          path /var/log/fluentd-buffers/k8s.buffer
          chunk_limit_size 16m
          queue_limit_length 512
          flush_interval 5s
          overflow_action drop_oldest_chunk
          retry_max_interval 30s
          retry_forever false
          retry_type exponential_backoff
          retry_timeout 1h
          retry_wait 20s
          retry_max_times 5
        </buffer>
      </store>
    </match>

Your Error Log

failed to flush the buffer. retry_times=1 next_retry_time=2022-04-21 19:40:04 +0530 chunk="5dd2a94a132a30cde03bf861c7429e4b" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster

Additional context

No response

Apr 21 '22 18:04 anup1384

Facing the same issue, @anup1384 did you get any solution?

May 26 '22 13:05 vikasmishra17

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

Aug 25 '22 10:08 github-actions[bot]

This issue was automatically closed because of stale in 30 days

Sep 24 '22 10:09 github-actions[bot]