beats icon indicating copy to clipboard operation
beats copied to clipboard

filebeat (practically) hangs after restart on machine with a lot of logs

Open Mekk opened this issue 5 years ago • 18 comments

Sorry for somewhat vague problem description. I was working on the problem in panic mode and had no time to preserve details like exact log snippets.

I use filebeat to scrap logs from machines where i have PLENTY of log files. Numbers vary, but having a few thousands new files every day is normal and sth like 50 thousands happens (in general: a lot of small apps running, each app opens new log on each restart and after each midnight, old logs are moved out by name, log names contain date…). After some initial tuning (harvester limit adapted to system capacity, time limits tuned to abandon inactive files at reasonable rate, etc) filebeat handles those setups without problems.

Well, handled until I reconfigured. Today I split some inputs into a few sections, more or less it was change from

- type: log
  enabled: true
  paths:
    - /some/path/*.log
    - /other/path/*.log
    - …
  # and other settings

into

- type: log
  enabled: true
  paths:
    - /some/path/*.log
  # and other settings
- type: log
  enabled: true
  paths:
    - /other/path/*.log
  # and other settings
- type: log
  enabled: true
  paths:
    - …remaining …
  # and other settings

where „other settings” were the same everywhere except I added some custom fields in each section (this was the reason for the change), and reduced harvester limits so they summed up to the original value.

After restart, filebeat practically stopped working. It either didn't forward anything, or handled only one section (depending on the machine, I saw the problem on a few). After I enabled debug logging it only logged myriads of notes about reading and writing data.json with some statistics („XYZV before, XYZV after”) and later started to put some notes about finding or adding state of individual files (~ 1 such note per second). On the machine I fixed last it remained in such a state for about 2 hours, without noticeable change.

I recovered from that by stopping filebeat, erasing /var/lib/filebeat/registry, and starting filebeat again – then it simply started to work and seems to work properly (of course it republished all old logs according to my age settings and introduced lag caused by old data processing).

I suppose there is sth wrong in the way state is migrated/interpreted/checked after input sections change. This is somewhat unexpected, I didn't think that changing alocation of files to sections impacts state handling.

Tested on filebeat 7.5.1

PS My /var/lib/filebeat/registry/data.json have 3-8MB, if that is of any interest.

Mekk avatar Feb 04 '20 17:02 Mekk

I took a look at sources trying to refresh my memory about log entries (actual logs unfortunately rotated after I reset). IIRC I saw in huge numbers snippets like

… sth about reading registry …
Registrar states cleaned up. Before: 3176, After: 3716, Pending: 0
Writing registry file: …

(and again and again) and later slowly it started to emit also

New state added for /some/specific/log/name/different/every/time

Mekk avatar Feb 04 '20 17:02 Mekk

Uuuuuuppsss. This may be 7.5.1 problem, not configuration change problem. I just restarted one of those filebeats, without changing anything, and looks like I have problem back.

Well, we are after hours so I may try verifying debug logs more calmly.

Mekk avatar Feb 04 '20 19:02 Mekk

OK, so this is typical snippet I see repeating ad infinitum :

2020-02-04T20:51:42.595+0100	DEBUG	[input]	file/states.go:68	New state added for /home/ids_backend_api/APP/log/adb_processor_adm_20200202_021116_18336_pqt-ids-deb8.log
2020-02-04T20:51:42.596+0100	DEBUG	[registrar]	registrar/registrar.go:346	Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.598+0100	DEBUG	[registrar]	registrar/registrar.go:411	Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.678+0100	DEBUG	[registrar]	registrar/registrar.go:404	Registry file updated. 32448 states written.
2020-02-04T20:51:42.678+0100	DEBUG	[registrar]	registrar/registrar.go:356	Processing 1 events
2020-02-04T20:51:42.678+0100	DEBUG	[registrar]	registrar/registrar.go:326	Registrar state updates processed. Count: 1
2020-02-04T20:51:42.679+0100	DEBUG	[acker]	beater/acker.go:64	stateful ack	{"count": 1}
2020-02-04T20:51:42.679+0100	DEBUG	[publisher]	pipeline/client.go:220	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:1855605-65026 Finished:true Fileinfo:<nil> Source:/home/ids_backend_api/APP/log/adb_processor_adm_20200202_021116_18336_pqt-ids-deb8.log Offset:160240 Timestamp:2020-02-04 20:47:25.96242759 +0100 CET TTL:-1ns Type:log Meta:map[] FileStateOS:1855605-65026} TimeSeries:false}
2020-02-04T20:51:42.679+0100	DEBUG	[input]	file/states.go:68	New state added for /home/ids_backend_api/APP/log/adb_processor_adm_20200203_003539_21527_pqt-ids-deb8.log
2020-02-04T20:51:42.679+0100	DEBUG	[registrar]	registrar/registrar.go:346	Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.681+0100	DEBUG	[registrar]	registrar/registrar.go:411	Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.763+0100	DEBUG	[registrar]	registrar/registrar.go:404	Registry file updated. 32448 states written.
2020-02-04T20:51:42.763+0100	DEBUG	[registrar]	registrar/registrar.go:356	Processing 1 events
2020-02-04T20:51:42.763+0100	DEBUG	[registrar]	registrar/registrar.go:326	Registrar state updates processed. Count: 1
2020-02-04T20:51:42.764+0100	DEBUG	[registrar]	registrar/registrar.go:346	Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.767+0100	DEBUG	[acker]	beater/acker.go:64	stateful ack	{"count": 1}
2020-02-04T20:51:42.772+0100	DEBUG	[registrar]	registrar/registrar.go:411	Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.772+0100	DEBUG	[publisher]	pipeline/client.go:220	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:1894543-65026 Finished:true Fileinfo:<nil> Source:/home/ids_backend_api/APP/log/adb_processor_adm_20200203_003539_21527_pqt-ids-deb8.log Offset:161959 Timestamp:2020-02-04 20:47:26.0981662 +0100 CET TTL:-1ns Type:log Meta:map[] FileStateOS:1894543-65026} TimeSeries:false}

Timing here (~0.1s per such section) is rather happy, later I see it slowing down a bit. Still, even if it kept 10files/s ratio, with my 70.000 files on the machine I don't expect this restart to finish soon.

Nothing seems going upstream.

Mekk avatar Feb 04 '20 20:02 Mekk

I tried downgrading to 7.4.1 and 7.1.1 and looks like I still have a problem. Mayhaps I am unlucky having particularly many files today (some app crashed a lot and generated many of them) but IIRC I happend to restart some 7.* version without issues (surely I did some upgrades). But it was with previous configuration.

Going to try bisecting the latter a bit.

Mekk avatar Feb 04 '20 20:02 Mekk

OK, I am not sure whether this is the only factor, but it looks like I have particularly many files today (due to crashes of some app), and this is the main reason for the problems – not my config change (which triggered the problem because I restarted filebeat to apply it) and not filebeat version.

I accept that it takes time to process them, but it is problematic that filebeat doesn't start handling other files and even other inputs in such a case (no harvesters start, at least for very noticeable time).

Sole cost seems also somewhat problematic, 0.1-0.2s for single file, with files processed linearly, is a lot. This doesn't seem directly related to the filesystem cost, directory uses ext4 dir_index. As filebeat has also high CPU usage in this phase, and as I straced it to repeatably open an write data.json, it seems to me that the slowdown may be caused by repeatable read/parse/update/serialize/write cycle on data.json. In case of multiple inputs there also can be some contention for locks on data.json or sth similar. So my wild bet is that huge improvement could be obtained by checkpointing data.json from time to time instead of doing it after every single file what seems to have place.

Mekk avatar Feb 04 '20 22:02 Mekk

I'm running into the same issue on 7.4.0 and 7.5.2. I have about 500 thousand files about 1mb large each. Filebeat can take days to start outputting events to Logstash after a restart. My registry file is about 20 Mb large.

Some things I've noticed:

  • If I delete my registry file, Filebeat will start sending events immediately
  • The amount of time it takes for Filebeat to start sending events seems to grow as the size of my registry file increased. For example, when my registry file was only 200 Kb the restart would take a few minutes.
  • When looking at usage statistics, the number of open files by the harvester seems to stay at 0 for days (I see this increase to the number of files I'm ingesting once filebeat is outputting events)
"harvester": {
    "open_files": 0,
    "running": 0
}

CPU usage by filebeat is also suspiciously high. I'm not sure what the bottle neck would be. Potentially it's the CPU on my machine:

  • Elasticsearch/logstash has tons of headroom (even when filebeat is sending events before the restart is issued)
  • Not hitting RAM limits on the machine (8 GB is still free)
  • Filebeat is using 100% of the CPU on one of the cores of the machine
  • Disk IO is minimal (about 11% usage)

Things I've Tried:

  • Purging my filebeat registry file (filebeat starts sending events quickly afterwards but then I have to re-send everything)
  • Tuning my filebeat inputs. This should only leave about 100 files being active at once.
- type: log
  harvester_limit: 512

  enabled: true
  scan_frequency: 120
  close_inactive: 1m
  clean_inactive: 750h
  clean_removed: true

serant avatar Feb 07 '20 21:02 serant

One more thing to add: My log path has two wildcard in it. Mainly because I'm parsing a folder of n subfolders of n text files. So:

paths:
  - /path_to_subfolders/*/*.txt

serant avatar Feb 07 '20 21:02 serant

Unfortunately, the registry file has its limitations in Filebeat. It is just a plain text file with JSON data in it. So everytime the state needs to be updated (e.g. events were ACKed or dropped), the complete registry is flushed to disk. The flush consist of the following steps:

  • creating new temporary file
  • serialize millions of entries to JSON
  • fsync the file
  • close the FD
  • move the file
  • fsync the folder where the registry is located

This is a slow process and it can be subject to backpressure.

There are a few workarounds:

  • stop Filebeat and clean the registry manually or by an external script (then restart Filebeat)
  • decrease the intervals configured in clean_* settings to make Filebeat remove entries from the registry
  • set registry.flush to a higher interval, so Filebeat flushes the state info less frequently (https://www.elastic.co/guide/en/beats/filebeat/current/configuration-general-options.html#_registry_flush)

Also, next time please post your question on Discuss. Github issues are reserved for verified bugs.

kvch avatar Feb 11 '20 09:02 kvch

Well. You have fairly simple test case: start filebeat, create 50.000 files in log directory, call filebeat restart. After restart filebeat will spend hours doing 100% CPU before it starts actually doing anything.

I'd say the current registry design is buggy, at least in cases where it's possible to have many log files. It's likely that switching from single json into something smarter, what wouldn't require rewriting whole registry after every operation (.sqlite for example, or even state batched into many files with - say - 100 entries per single file) would resolve the issue.

I am going to look at flush settings.

Mekk avatar Feb 13 '20 13:02 Mekk

There is an ongoing effort to rewrite the registry. Hopefully, when it is merged, your situtation will improve: https://github.com/elastic/beats/pull/12908

kvch avatar Feb 13 '20 18:02 kvch

It preliminarily seems that

filebeat.registry.flush: 60s

mitigates the problem, although I am yet to see behaviour in the „worst” cases.

It seems to me that current default of 0 is - considering current costly implementation - wrong, and it would be more sensible to default to sth else.

Or, at the very least, I'd suggest that /etc/filebeat/filebeat.yml from packaged filebeats contained this setting (even if commented-out) with some remark about it's big performance impact.

PS #12908 makes me afraid for now („All in all there is a performance degragation due to this change. This becomes most noticable in increased CPU usage.”). I understand this is temporary step, but maybe during this temporary state suggestions above make even more sense?

Mekk avatar Feb 14 '20 17:02 Mekk

I'm still seeing this same issue after making the tweaks suggested above. I've opened this topic in Discuss to continue the topic. https://discuss.elastic.co/t/filebeat-extremely-slow-startup-for-large-number-of-files/240746

Thanks

serant avatar Jul 10 '20 18:07 serant

I have the same issue and it won't get fixed until I remove registry, which is undesired because leads to lots of duplications. A small portion of log file is as follows:

2021-01-31T05:03:36.641+0330	INFO	instance/beat.go:455	filebeat start running.
2021-01-31T05:03:36.641+0330	INFO	[monitoring]	log/log.go:118	Starting metrics logging every 30s
2021-01-31T05:03:36.641+0330	DEBUG	[service]	service/service_windows.go:93	Windows is interactive: true
2021-01-31T05:03:36.641+0330	DEBUG	[test]	registrar/migrate.go:304	isFile(C:\ProgramData\Elastic\Beats\filebeat\data\registry) -> false
2021-01-31T05:03:36.641+0330	DEBUG	[test]	registrar/migrate.go:304	isFile() -> false
2021-01-31T05:03:36.641+0330	DEBUG	[test]	registrar/migrate.go:297	isDir(C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat) -> true
2021-01-31T05:03:36.641+0330	DEBUG	[test]	registrar/migrate.go:304	isFile(C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat\meta.json) -> true
2021-01-31T05:03:36.642+0330	DEBUG	[registrar]	registrar/migrate.go:84	Registry type '1' found
2021-01-31T05:03:37.251+0330	INFO	memlog/store.go:119	Loading data file of 'C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat' succeeded. Active transaction id=862418635
2021-01-31T05:03:38.325+0330	INFO	memlog/store.go:124	Finished loading transaction log file for 'C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat'. Active transaction id=862438999
2021-01-31T05:03:38.326+0330	WARN	beater/filebeat.go:381	Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-01-31T05:03:38.643+0330	INFO	[registrar]	registrar/registrar.go:109	States Loaded from registrar: 16798
2021-01-31T05:03:38.644+0330	INFO	[crawler]	beater/crawler.go:71	Loading Inputs: 10
2021-01-31T05:03:38.644+0330	DEBUG	[registrar]	registrar/registrar.go:140	Starting Registrar
2021-01-31T05:03:38.655+0330	WARN	[cfgwarn]	log/config.go:184	EXPERIMENTAL: scan_sort is used.
2021-01-31T05:03:38.655+0330	DEBUG	[input]	log/config.go:207	recursive glob enabled
2021-01-31T05:03:38.656+0330	WARN	[cfgwarn]	log/config.go:184	EXPERIMENTAL: scan_sort is used.
2021-01-31T05:03:38.656+0330	DEBUG	[input]	log/input.go:169	exclude_files: []. Number of states: 16798
2021-01-31T05:03:38.656+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-26\13__.txt
2021-01-31T05:03:38.656+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:38.656+0330	DEBUG	[registrar]	registrar/registrar.go:264	Processing 1 events
2021-01-31T05:03:38.657+0330	DEBUG	[registrar]	registrar/registrar.go:231	Registrar state updates processed. Count: 1
2021-01-31T05:03:38.656+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::4849664-152171-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-26\13__.txt Offset:29948 Timestamp:2021-01-31 04:57:20.5605686 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:4849664-152171-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:38.657+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-27\18__.txt
2021-01-31T05:03:38.657+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:38.657+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::5111808-156889-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-27\18__.txt Offset:18932 Timestamp:2021-01-31 04:57:18.0947681 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:5111808-156889-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:38.657+0330	DEBUG	[registrar]	registrar/registrar.go:254	Registrar states cleaned up. Before: 16798, After: 16798, Pending: 1
2021-01-31T05:03:38.658+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-23\15__.txt
2021-01-31T05:03:38.658+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:38.666+0330	DEBUG	[registrar]	registrar/registrar.go:201	Registry file updated. 16798 active states.
2021-01-31T05:03:39.981+0330	DEBUG	[registrar]	registrar/registrar.go:264	Processing 1 events
2021-01-31T05:03:39.981+0330	DEBUG	[registrar]	registrar/registrar.go:231	Registrar state updates processed. Count: 1
2021-01-31T05:03:39.981+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::13172736-514469-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-23\15__.txt Offset:18832 Timestamp:2021-01-31 04:57:07.2302427 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:13172736-514469-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:39.981+0330	DEBUG	[registrar]	registrar/registrar.go:254	Registrar states cleaned up. Before: 16798, After: 16798, Pending: 2
2021-01-31T05:03:39.983+0330	DEBUG	[registrar]	registrar/registrar.go:201	Registry file updated. 16798 active states.
2021-01-31T05:03:39.983+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-24\13__.txt
2021-01-31T05:03:39.983+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:40.852+0330	DEBUG	[registrar]	registrar/registrar.go:264	Processing 1 events
2021-01-31T05:03:40.852+0330	DEBUG	[registrar]	registrar/registrar.go:231	Registrar state updates processed. Count: 1
2021-01-31T05:03:40.853+0330	DEBUG	[registrar]	registrar/registrar.go:254	Registrar states cleaned up. Before: 16798, After: 16798, Pending: 3
2021-01-31T05:03:40.858+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::1966080-516576-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-24\13__.txt Offset:18732 Timestamp:2021-01-31 04:57:04.8268518 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:1966080-516576-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:40.860+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-22\20__.txt
2021-01-31T05:03:40.860+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:40.861+0330	DEBUG	[registrar]	registrar/registrar.go:201	Registry file updated. 16798 active states.
2021-01-31T05:03:42.192+0330	DEBUG	[registrar]	registrar/registrar.go:264	Processing 1 events
2021-01-31T05:03:42.192+0330	DEBUG	[registrar]	registrar/registrar.go:231	Registrar state updates processed. Count: 1
2021-01-31T05:03:42.193+0330	DEBUG	[registrar]	registrar/registrar.go:254	Registrar states cleaned up. Before: 16798, After: 16798, Pending: 4
2021-01-31T05:03:42.194+0330	DEBUG	[registrar]	registrar/registrar.go:201	Registry file updated. 16798 active states.
2021-01-31T05:03:42.424+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::18350080-512223-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-22\20__.txt Offset:21908 Timestamp:2021-01-31 04:56:49.6360456 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:18350080-512223-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:42.430+0330	DEBUG	[input]	file/states.go:68	New state added for C:\Logs\FileService\2021-01-26\15__.txt
2021-01-31T05:03:42.430+0330	DEBUG	[acker]	beater/acker.go:59	stateful ack	{"count": 1}
2021-01-31T05:03:43.634+0330	DEBUG	[registrar]	registrar/registrar.go:264	Processing 1 events
2021-01-31T05:03:43.634+0330	DEBUG	[registrar]	registrar/registrar.go:231	Registrar state updates processed. Count: 1
2021-01-31T05:03:43.634+0330	DEBUG	[registrar]	registrar/registrar.go:254	Registrar states cleaned up. Before: 16798, After: 16798, Pending: 5
2021-01-31T05:03:43.650+0330	DEBUG	[registrar]	registrar/registrar.go:201	Registry file updated. 16798 active states.
2021-01-31T05:03:43.691+0330	DEBUG	[publisher]	pipeline/client.go:231	Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::25886720-152248-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-26\15__.txt Offset:16864 Timestamp:2021-01-31 04:56:54.2161996 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:25886720-152248-1524026271 IdentifierName:native} TimeSeries:false}

SudoNova avatar Jan 31 '21 02:01 SudoNova

Hi,

I have filebeat-7.12.1 on windows-2016, which is trying to process 411336 items "D:\Logs\ProjectA-METRICS***.json" with varied file *.json sizes. Max 280 MB and Min 7.5KB and writing to AWS OpenSearch Index.

2022-07-18T09:05:33.324+0530 INFO instance/beat.go:660 Home path: [C:\Program Files\Elastic\Beats\7.12.1\filebeat] Config path: [C:\ProgramData\Elastic\Beats\filebeat] Data path: [C:\ProgramData\Elastic\Beats\filebeat\data] Logs path: [C:\ProgramData\Elastic\Beats\filebeat\logs] 2022-07-18T09:05:33.338+0530 DEBUG [beat] instance/beat.go:712 Beat metadata path: C:\ProgramData\Elastic\Beats\filebeat\data\meta.json 2022-07-18T09:05:33.339+0530 INFO instance/beat.go:668 Beat ID: 5a3fd1ad-71a0-428d-9341-9b93aa45eb4e 2022-07-18T09:05:33.339+0530 DEBUG [seccomp] seccomp/seccomp.go:96 Syscall filtering is only supported on Linux 2022-07-18T09:05:33.339+0530 INFO [beat] instance/beat.go:996 Beat info {"system_info": {"beat": {"path": {"config": "C:\ProgramData\Elastic\Beats\filebeat", "data": "C:\ProgramData\Elastic\Beats\filebeat\data", "home": "C:\Program Files\Elastic\Beats\7.12.1\filebeat", "logs": "C:\ProgramData\Elastic\Beats\filebeat\logs"}, "type": "filebeat", "uuid": "5a3fd1ad-71a0-428d-9341-9b93aa45eb4e"}}} 2022-07-18T09:05:33.340+0530 INFO [beat] instance/beat.go:1005 Build info {"system_info": {"build": {"commit": "651a2ad1225f3d4420a22eba847de385b71f711d", "libbeat": "7.12.1", "time": "2021-04-20T19:58:27.000Z", "version": "7.12.1"}}} 2022-07-18T09:05:33.340+0530 INFO [beat] instance/beat.go:1008 Go runtime info {"system_info": {"go": {"os":"windows","arch":"amd64","max_procs":10,"version":"go1.15.9"}}} 2022-07-18T09:05:33.342+0530 INFO [beat] instance/beat.go:1012 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2022-07-06T00:38:55.48+05:30","name":"INTAWPGTUI051","ip":["10.168.148.50/24","::1/128","127.0.0.1/8"],"kernel_version":"10.0.14393.4467 (rs1_release.210604-1844)","mac":["00:50:56:87:ad:d8"],"os":{"type":"windows","family":"windows","platform":"windows","name":"Windows Server 2016 Standard","version":"10.0","major":10,"minor":0,"patch":0,"build":"14393.4467"},"timezone":"IST","timezone_offset_sec":19800,"id":"a4f380e9-055f-4b4c-bdc9-b1824c2dcabf"}}} 2022-07-18T09:05:33.343+0530 INFO [beat] instance/beat.go:1041 Process info {"system_info": {"process": {"cwd": "C:\Windows\system32", "exe": "C:\Program Files\Elastic\Beats\7.12.1\filebeat\filebeat.exe", "name": "filebeat.exe", "pid": 20412, "ppid": 724, "start_time": "2022-07-18T09:05:33.257+0530"}}} 2022-07-18T09:05:33.343+0530 INFO instance/beat.go:304 Setup Beat: filebeat; Version: 7.12.1 2022-07-18T09:05:33.343+0530 DEBUG [beat] instance/beat.go:330 Initializing output plugins 2022-07-18T09:05:33.343+0530 INFO eslegclient/connection.go:99 elasticsearch url: https://elkprod132-.ap-south-1.es.amazonaws.com:443 2022-07-18T09:05:33.343+0530 DEBUG [publisher] pipeline/consumer.go:148 start pipeline event consumer

I am using filebeat input section below, tried different values and getting the panic error again and again.

filebeat.inputs:

  • type: filestream harvester_limit: 512 enabled: true close_inactive: 1m clean_inactive: 4h clean_removed: true paths:
    • D:\Logs\XXX_CONTROLLER_XXX_METRICS***.json scan_frequency: 120 json.keys_under_root: true json.add_error_key: true processors:
    • decode_json_fields: fields: - message - correlationId - Exception process_array: false max_depth: 10 target: '' overwrite_keys: true

=== Error==== panic: reflect.Select: too many cases (max 65536)

goroutine 130 [running]: reflect.Select(0xc36750a000, 0x10001, 0x1036d, 0x1, 0x1e373c0, 0xc02d19b0e0, 0x16, 0x9801) /usr/local/go/src/reflect/value.go:2181 +0x9a9 github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).runSignalPropagation(0xc000736140) /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/pipeline.go:365 +0x1d4 created by github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).registerSignalPropagation.func1 /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/pipeline.go:350 +0x85

adapasuresh avatar Jul 18 '22 04:07 adapasuresh

Reopening as there seems to be issues still in Filebeat 8.3 https://discuss.elastic.co/t/fillebeat-keeps-restarting-in-kuberneates/309702

jsoriano avatar Aug 22 '22 15:08 jsoriano

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Aug 22 '22 15:08 elasticmachine

I experienced a very similar issue with filebeat v8.8.2 making little to no progress reading and shipping log files. Using lsof -o showed that the file descriptor offsets for the ~100 open files was not updating for at least 20 of the files despite the files being actively written (i.e. not closed due to inactivity). The worst case was a ~250MB web server container HTTP access log file of which filebeat had only read ~4MB and hadn't made any progress in about an hour.

Per @kvch's comment I added filebeat.registry.flush: 60s to the filebeat.yml and this appears to have resolved the issue for my systems.

For reference, the filebeat.yml in use is very lean:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      add_resource_metadata:
        namespace:
          enabled: false
        deployment: false
        cronjob: false
      templates:
        - condition:
            and:
              - regexp:
                  kubernetes.namespace: "^some-prefix.*"
              - not:
                  equals:
                    kubernetes.labels.some-label: "some-value"
          config:
            - type: container
              paths:
                - /var/log/containers/*${data.kubernetes.container.id}.log
              tail_files: true
#filebeat.registry.flush: 60s # adding this fixed throughput
output.elasticsearch:
  index: default
  hosts: ["http://localhost:9200"]
  allow_older_versions: true
setup:
  ilm.enabled: false
  template.enabled: false

jstangroome avatar Jul 19 '23 01:07 jstangroome

#https://github.com/elastic/beats/issues/16076#issuecomment-1186743957

I'm also experiencing this in 8.13.1 with only ~17k files created hourly.

clairmont32 avatar May 23 '24 19:05 clairmont32

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

botelastic[bot] avatar May 23 '25 20:05 botelastic[bot]