fluentd `copy` plugin with multiple stores (paticularly AWS S3 and Elasticsearch) and `ignore_error` behaves differently when Elasticsearch is down

`copy` plugin with multiple stores (paticularly AWS S3 and Elasticsearch) and `ignore_error` behaves differently when Elasticsearch is down

Open prabhpreet332 opened this issue 2 years ago • 1 comments

Describe the bug

When copy plugin is used with multiple stores (paticularly S3 and Elasticsearch) along with ignore_error, one of the edge cases do not behaves as expected.

In my setup, I have been sending logs to S3 and Elasticsearch. But when the Elasticsearch is down, it tries to connect to the ES host but meanwhile fluentd does not sends logs to other stores (S3). Even with ignore_error applied to the store containing ES plugin, error is not handled which prevents the logs to be sent to S3.

fluent.conf below

To Reproduce

using the below fluent.conf file in the docker-compose setup having Elasticsearch and fluentd services.

Scenario

Initially all the services of the docker-compose are down
Start the fluentd service keeping the ES service down

Observation

In the fluentd container logs:

fluentd tries to connect to ES host and shows the following logs -

fluentd          | 2022-06-14 09:16:59 +0000 [info]: starting fluentd-1.14.6 pid=7 ruby="2.7.5"
fluentd          | 2022-06-14 09:16:59 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
fluentd          | 2022-06-14 09:16:59 +0000 [info]: adding match pattern="file.**" type="copy"
fluentd          | 2022-06-14 09:17:01 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.144.2:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-14 09:17:01 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
fluentd          | 2022-06-14 09:17:05 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.144.2:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-14 09:17:05 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
fluentd          | 2022-06-14 09:17:13 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.144.2:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-14 09:17:13 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).
.
.
.

These logs goes on till last retry with exponential time gap in between.

fluent.conf

<source>
  @type tail
  format json
  read_from_head true
  tag file.log
  path /fluentd/log/*.logs
  time_format %Y-%m-%dT%H:%M:%S
  pos_file /tmp/file-logs.pos
</source>

<match ** >
@type copy
<store ignore_error>
  @type elasticsearch
  host elasticsearch
  port 9200
  index_name fluentd_docker
  type_name fluentd

  <buffer>
    @type file
    path /var/log/fluent/es/es.*.log

    timekey_use_utc true # use utc

    flush_interval 30s
    flush_mode interval
    flush_at_shutdown true
  </buffer>
</store>

<store>
  @type s3

  aws_key_id "#{ENV['AWS_KEY_ID']}"
  aws_sec_key "#{ENV['AWS_SEC_KEY']}"

  s3_bucket <my_bucket_name>
  s3_region <aws_region_name>
  path logs/

  s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
  store_as json

  <buffer>
    @type file
    path /var/log/fluent/s3/s3.*.log
    timekey_use_utc true

    flush_interval 30s
    flush_mode interval
    flush_at_shutdown true
  </buffer>

  time_slice_format %Y%m%d%H
  
  <format>
    @type json
  </format>
</store>

</match>

docker-compose.yml

version: "3.9"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.17.3
    environment:
      - node.name=elasticsearch
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - discovery.type=single-node

    ports:
      - 9200:9200
    deploy:
      resources:
        limits:
          cpus: "0.50" # Use at most 50% of one CPU core
          memory: 1G # Use at most 50 MB of RAM
    ulimits:
      memlock:
        hard: -1
        soft: -1
      nofile:
        hard: 65535
        soft: 65535
      nproc: 65535

  fluentd:
    container_name: fluentd
    build:
      context: ./
    volumes:
    - ./configurations/fluent.conf:/fluentd/etc/fluent.conf
    - ./logs:/fluentd/log/
    logging:
      driver: "local"
    environment:
      - AWS_KEY_ID=${AWS_KEY_ID}
      - AWS_SEC_KEY=${AWS_SEC_KEY}

Dockerfile to build custom image on fluent/fluentd image:

FROM fluent/fluentd:v1.14.6-debian-1.0
USER root
RUN gem install elasticsearch -v 7.17.0
RUN fluent-gem install fluent-plugin-elasticsearch

Logs are not sent to S3

Expected behavior

Behaviour expected was logs to be sent to the store containing AWS S3 plugin if ES is down. But such behaviour was not witnessed in my experimentation.

Your Environment

- Fluentd version: fluentd 1.4.6
- TD Agent version: 
- Operating system: "Debian GNU/Linux 11 (bullseye)"
- Kernel version: 5.14.0-1038-oem

Your Configuration

`fluent.conf`, `docker-compose.yml` and `Dockerfile` to build custom image mentioned in **To Reproduce**

Your Error Log

fluentd          | 2022-06-15 12:50:01 +0000 [info]: parsing config file is succeeded path="/fluentd/etc/fluent.conf"
fluentd          | 2022-06-15 12:50:01 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.2.2'
fluentd          | 2022-06-15 12:50:01 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
fluentd          | 2022-06-15 12:50:01 +0000 [info]: gem 'fluentd' version '1.14.6'
fluentd          | 2022-06-15 12:50:01 +0000 [info]: using configuration file: <ROOT>
fluentd          |   <source>
fluentd          |     @type tail
fluentd          |     format json
fluentd          |     read_from_head true
fluentd          |     tag file.log
fluentd          |     path /fluentd/log/*.logs
fluentd          |     time_format %Y-%m-%dT%H:%M:%S
fluentd          |     pos_file /tmp/file-logs.pos
fluentd          |   </source>
fluentd          |   
fluentd          |   <match file.** >
fluentd          |   @type copy
fluentd          |   <store ignore_error>
fluentd          |     @type elasticsearch
fluentd          |     host elasticsearch
fluentd          |     port 9200
fluentd          |     index_name fluentd_docker
fluentd          |     type_name fluentd
fluentd          |   
fluentd          |     <buffer>
fluentd          |       @type file
fluentd          |       path /var/log/fluent/es/es.*.log
fluentd          |   
fluentd          |       timekey_use_utc true # use utc
fluentd          |   
fluentd          |       flush_interval 30s
fluentd          |       flush_mode interval
fluentd          |       flush_at_shutdown true
fluentd          |     </buffer>
fluentd          |   </store>
fluentd          |   
fluentd          |   <store>
fluentd          |     @type s3
fluentd          |   
fluentd          |     aws_key_id "#{ENV['AWS_KEY_ID']}"
fluentd          |     aws_sec_key "#{ENV['AWS_SEC_KEY']}"
fluentd          |   
fluentd          |     s3_bucket <my_bucket_name>
fluentd          |     s3_region <aws_region_name>
fluentd          |     path logs/
fluentd          |   
fluentd          |     s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
fluentd          |     store_as json
fluentd          |   
fluentd          |     <buffer>
fluentd          |       @type file
fluentd          |       path /var/log/fluent/s3/s3.*.log
fluentd          |       timekey_use_utc true
fluentd          |   
fluentd          |       flush_interval 30s
fluentd          |       flush_mode interval
fluentd          |       flush_at_shutdown true
fluentd          |     </buffer>
fluentd          |   
fluentd          |     time_slice_format %Y%m%d%H
fluentd          |     
fluentd          |     <format>
fluentd          |       @type json
fluentd          |     </format>
fluentd          |   </store>
fluentd          |   
fluentd          |   </match>
fluentd          | </ROOT>
fluentd          | 2022-06-15 12:50:01 +0000 [info]: starting fluentd-1.14.6 pid=7 ruby="2.7.5"
fluentd          | 2022-06-15 12:50:01 +0000 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "--config", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
fluentd          | 2022-06-15 12:50:02 +0000 [info]: adding match pattern="file.**" type="copy"
fluentd          | 2022-06-15 12:50:04 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.208.3:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-15 12:50:04 +0000 [warn]: #0 Remaining retry: 14. Retry to communicate after 2 second(s).
fluentd          | 2022-06-15 12:50:08 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.208.3:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-15 12:50:08 +0000 [warn]: #0 Remaining retry: 13. Retry to communicate after 4 second(s).
fluentd          | 2022-06-15 12:50:16 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.208.3:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-15 12:50:16 +0000 [warn]: #0 Remaining retry: 12. Retry to communicate after 8 second(s).
fluentd          | 2022-06-15 12:50:32 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.208.3:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-15 12:50:32 +0000 [warn]: #0 Remaining retry: 11. Retry to communicate after 16 second(s).
fluentd          | 2022-06-15 12:51:04 +0000 [warn]: #0 Could not communicate to Elasticsearch, resetting connection and trying again. Connection refused - connect(2) for 192.168.208.3:9200 (Errno::ECONNREFUSED)
fluentd          | 2022-06-15 12:51:04 +0000 [warn]: #0 Remaining retry: 10. Retry to communicate after 32 second(s).
.
.
.

Additional context

No response

Jun 15 '22 16:06 prabhpreet332

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days