fluent-plugin-cloudwatch-logs icon indicating copy to clipboard operation
fluent-plugin-cloudwatch-logs copied to clipboard

Getting Fluent::Plugin::CloudwatchLogsOutput::TooLargeEventError

Open sergeisantoyo opened this issue 1 year ago • 1 comments

Problem

I'm using image fluent/fluentd-kubernetes-daemonset:v1.12.2-debian-cloudwatch-1.3 and I also tried with the latest one, but I'm getting this error while trying to log to Cloudwatch.

#<Thread:0x00007f7b44f9b2c8 /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:323 run> terminated with exception (report_on_exception is true):
/fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:382:in `put_events_by_chunk': Log event in <LOG_GROUP_NAME> is discarded because it is too large: 671770 bytes exceeds limit of 262144 (Fluent::Plugin::CloudwatchLogsOutput::TooLargeEventError)
    from /fluentd/vendor/bundle/ruby/3.1.0/gems/fluent-plugin-cloudwatch-logs-0.14.3/lib/fluent/plugin/out_cloudwatch_logs.rb:326:in `block (2 levels) in write'

I know this is an issue with the Cloudwatch API that limits the size of the events. Is this fixed for other versions of that image? I've seen that for fluent-bit, the solution was to truncate the log if it's too big.

Any idea on how to fix it for this specific image?

...

Steps to replicate

This is my current config:

02_output.conf: |-
  <label @NORMAL>
    <match **>
      @type cloudwatch_logs
      @id out_cloudwatch_logs_containers
      region "#{ENV.fetch('REGION')}"
      log_group_name_key group_name
      remove_log_group_name_key true
      log_group_aws_tags "{ \"Name\": \"#{ENV.fetch('CLUSTER_NAME')}\", \"kubernetes.io/cluster/#{ENV.fetch('CLUSTER_NAME')}\": \"owned\" }"
      log_stream_name_key stream_name
      remove_log_stream_name_key true
      auto_create_stream true
      retention_in_days 365
      concurrency 16
      <buffer>
        @type memory
        flush_thread_count 16
        flush_mode interval
        flush_at_shutdown true
        # total_limit_size 1GB
        # flush_interval 1s
        # chunk_limit_size 512k
        queued_chunks_limit_size 32
        retry_forever false
        retry_timeout 10m
        retry_max_times 5
        disable_chunk_backup true
      </buffer>
      <format>
        @type single_value
        message_key log
        add_newline false
      </format>
    </match>
  </label>

Reproducing is kinda difficult, but basically make a container log a very large log entry.

Expected Behavior or What you need to ask

I would expect the plugin to truncate the event or split it in different events.

Using Fluentd and CloudWatchLogs plugin versions

  • OS version
    • Debian 11 bullseye
  • Bare Metal or within Docker or Kubernetes or others?
    • Kubernetes
  • Fluentd v0.12 or v0.14/v1.0
    • paste result of fluentd --version or td-agent --version
      • fluentd 1.15.3

sergeisantoyo avatar Mar 14 '23 22:03 sergeisantoyo

I'm seeing the same issue using the latest "0.14.3" version. Is there even a workaround for this?

fernandino143 avatar Mar 01 '24 14:03 fernandino143