fluent-plugin-kafka icon indicating copy to clipboard operation
fluent-plugin-kafka copied to clipboard

Rdkafka doesn't recover after kafka node crash: Local: Fatal error (fatal)

Open fpytloun opened this issue 3 years ago • 0 comments
trafficstars

Describe the bug

I have Fluentd forwarding messages to 6 kafka brokers. After crash of single broker, 1/6 of produces will fail:

2022-06-13 09:26:24 +0000 [warn]: #1 [out_kafka_access] Send exception occurred: Local: Fatal error (fatal) at /usr/lib/ruby/gems/2.7.0/gems/rdkafka-0.11.1/lib/rdkafka/producer.rb:167:in `produce'
2022-06-13 09:26:24 +0000 [warn]: #1 [out_kafka_access] failed to flush the buffer. retry_times=11 next_retry_time=2022-06-13 10:02:19 +0000 chunk="5e1505b615b2145b6be8a740f2c72c83" error_class=Rdkafka::RdkafkaError error="Local: Fatal error (fatal)"
2022-06-13 10:02:19 +0000 [warn]: #1 [out_kafka_access] Send exception occurred: Local: Fatal error (fatal) at /usr/lib/ruby/gems/2.7.0/gems/rdkafka-0.11.1/lib/rdkafka/producer.rb:167:in `produce'
2022-06-13 10:02:19 +0000 [warn]: #1 [out_kafka_access] failed to flush the buffer. retry_times=12 next_retry_time=2022-06-13 11:08:44 +0000 chunk="5e1505b615b2145b6be8a740f2c72c83" error_class=Rdkafka::RdkafkaError error="Local: Fatal error (fatal)"

To Reproduce

Configure rdkafka producer, stop one kafka, start it again

Expected behavior

Kafka producer should recover properly.

Your Environment

- Fluentd version: 1.14.6
- TD Agent version:
- fluent-plugin-kafka version: 0.17.5
- ruby-kafka version:
- Operating system:
- Kernel version:

Your Configuration

      @type rdkafka2
      @id out_kafka
      brokers listofbrokers:9999

      use_event_time true
      topic_key _topic
      exclude_topic_key true
      default_topic fluentd.unknown
      use_default_for_unknown_topic true
      exclude_fields $._hash,$._index,$._alert,$._keep,$._sd,$._source,$._syslog_severity,$.kubernetes.labels.pod-template-hash

      <format>
        @type json
      </format>

      compression_codec gzip
      share_producer true
      # NOTE: Idempotent is not supported if acks are not required from
      # all ISR, also when enabled, we've seen memory leak on Kafka side
      # idempotent true
      # Don't wait for acks of all in-sync replicas when receiving
      # records, only one is sufficient. This is best option for both
      # performance and durability.
      #required_acks 1

      rdkafka_options {
        "enable.idempotence": true
      }

      ssl_client_cert_key /identity/client.key
      ssl_client_cert /identity/client.crt
      ssl_ca_cert /identity/ca.crt

      <buffer _topic>
        @type memory
        overflow_action block
        chunk_full_threshold 0.9
        compress gzip       # text,gzip
        flush_mode interval # default,interval,immediate,lazy
        flush_interval 10s
        flush_at_shutdown true
        flush_thread_count 4
      </buffer>

Your Error Log

2022-06-13 09:26:24 +0000 [warn]: #1 [out_kafka_access] Send exception occurred: Local: Fatal error (fatal) at /usr/lib/ruby/gems/2.7.0/gems/rdkafka-0.11.1/lib/rdkafka/producer.rb:167:in `produce'
2022-06-13 09:26:24 +0000 [warn]: #1 [out_kafka_access] failed to flush the buffer. retry_times=11 next_retry_time=2022-06-13 10:02:19 +0000 chunk="5e1505b615b2145b6be8a740f2c72c83" error_class=Rdkafka::RdkafkaError error="Local: Fatal error (fatal)"
2022-06-13 10:02:19 +0000 [warn]: #1 [out_kafka_access] Send exception occurred: Local: Fatal error (fatal) at /usr/lib/ruby/gems/2.7.0/gems/rdkafka-0.11.1/lib/rdkafka/producer.rb:167:in `produce'
2022-06-13 10:02:19 +0000 [warn]: #1 [out_kafka_access] failed to flush the buffer. retry_times=12 next_retry_time=2022-06-13 11:08:44 +0000 chunk="5e1505b615b2145b6be8a740f2c72c83" error_class=Rdkafka::RdkafkaError error="Local: Fatal error (fatal)"

Additional context

No response

fpytloun avatar Jun 13 '22 10:06 fpytloun