collector-sidecar icon indicating copy to clipboard operation
collector-sidecar copied to clipboard

Sidecar does not correctly detect a stalled filebeat journald input

Open nroach44 opened this issue 1 year ago • 0 comments

Problem description

When the journald files are corrupted (or the journald input otherwise fails) the sidecar does not know, and fails to report errors back to the graylog server.

It's worth noting that filebeat doesn't exit when this occurs, it just stops the journald input. I'm pretty confident that this isn't a normal situation for the sidecar.

Possible upstream issue: https://github.com/elastic/beats/issues/32782

Steps to reproduce the problem

  1. Have corrupted journal files:

journalctl --verify

PASS: /var/log/journal/cd7f7844c032416dafc4ea25fcfb0871/user-1000@64472e512c6c4c438219d1d337f19579-00000000000b7015-0005e7ab092b31d6.journal                       
2411ea0: Invalid entry item (18/21 offset: 000000                                                                                                                  
2411ea0: Invalid object contents: Bad message                                                                                                                      
File corruption detected at /var/log/journal/cd7f7844c032416dafc4ea25fcfb0871/[email protected]~:2411ea0 (of 41943040 bytes, 90%).  
FAIL: /var/log/journal/cd7f7844c032416dafc4ea25fcfb0871/[email protected]~ (Bad message)
PASS: /var/log/journal/cd7f7844c032416dafc4ea25fcfb0871/[email protected]~                          
  1. Have a sidecar installed on the server, with something like the following set up as a filebeat config assigned to the sidecar:
# Needed for Graylog
fields_under_root: true
fields.collector_node_id: ${sidecar.nodeName}
fields.gl2_source_collector: ${sidecar.nodeId}

filebeat.inputs:
- type: journald
  id: everything

output.logstash:
  enabled: true
  slow_start: true
  bulk_max_size: 512
  hosts: ["graylog.domain:1234"]
  backoff.init: 10
  backoff.max: 300

logging:
  level: warning
  to_files: false
  to_syslog: true
  json: false

path:
  data: /var/lib/graylog-sidecar/collectors/filebeat/data
  logs: /var/lib/graylog-sidecar/collectors/filebeat/log
  home: /usr/share/filebeat
  1. Observe that no log entries make it to the server
  2. Observe the filebeat output in the journal
May 19 19:55:16 hostname systemd[1]: Started Wrapper service for Graylog controlled collector.
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="Using node-id: <UUID>"
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="No node name was configured, falling back to hostname"
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="Starting signal distributor"
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="Adding process runner for: filebeat-63a12208827d252d2f7931ca"
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="[filebeat-63a12208827d252d2f7931ca] Configuration change detected, rewriting configuration file."
May 19 19:55:16 hostname filebeat[25072]: 2023-05-19T19:55:16.175+0800 WARN map[file.line:175 file.name:beater/filebeat.go] Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning. {"ecs.version": "1.6.0"}
May 19 19:55:16 hostname graylog-sidecar[25062]: time="2023-05-19T19:55:16+08:00" level=info msg="[filebeat-63a12208827d252d2f7931ca] Starting (exec driver)"
May 19 19:55:16 hostname filebeat[25080]: 2023-05-19T19:55:16.237+0800 WARN map[file.line:175 file.name:beater/filebeat.go] Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning. {"ecs.version": "1.6.0"}
May 19 19:55:16 hostname filebeat[25080]: 2023-05-19T19:55:16.287+0800 WARN map[file.line:307 file.name:beater/filebeat.go] Filebeat is unable to load the ingest pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the ingest pipelines or are using Logstash pipelines, you can ignore this warning. {"ecs.version": "1.6.0"}
May 19 19:55:16 hostname filebeat[25080]: 2023-05-19T19:55:16.287+0800 WARN [input] map[file.line:102 file.name:v2/loader.go] EXPERIMENTAL: The journald input is experimental        {"ecs.version": "1.6.0"}
May 19 19:55:16 hostname filebeat[25080]: 2023-05-19T19:55:16.324+0800 ERROR [input.journald] map[file.line:124 file.name:compat/compat.go] Input 'journald' failed with: input.go:130: input everything failed (id=everything)
                                                  failed to read message field: bad message        {"ecs.version": "1.6.0"}
  1. Observe that the collector status still shows as "Running"
  2. Remove the corrupt file, restart the service and view the collected logs

Environment

  • Sidecar Version: 1.4.0
  • Graylog Version: 5.1
  • Operating System: Debian 11
  • Elasticsearch Version: 7.17.6
  • MongoDB Version: 5.0.18

nroach44 avatar May 19 '23 12:05 nroach44