beats
beats copied to clipboard
filebeat (practically) hangs after restart on machine with a lot of logs
Sorry for somewhat vague problem description. I was working on the problem in panic mode and had no time to preserve details like exact log snippets.
I use filebeat to scrap logs from machines where i have PLENTY of log files. Numbers vary, but having a few thousands new files every day is normal and sth like 50 thousands happens (in general: a lot of small apps running, each app opens new log on each restart and after each midnight, old logs are moved out by name, log names contain date…). After some initial tuning (harvester limit adapted to system capacity, time limits tuned to abandon inactive files at reasonable rate, etc) filebeat handles those setups without problems.
Well, handled until I reconfigured. Today I split some inputs into a few sections, more or less it was change from
- type: log
enabled: true
paths:
- /some/path/*.log
- /other/path/*.log
- …
# and other settings
into
- type: log
enabled: true
paths:
- /some/path/*.log
# and other settings
- type: log
enabled: true
paths:
- /other/path/*.log
# and other settings
- type: log
enabled: true
paths:
- …remaining …
# and other settings
where „other settings” were the same everywhere except I added some custom fields in each section (this was the reason for the change), and reduced harvester limits so they summed up to the original value.
After restart, filebeat practically stopped working. It either didn't forward anything, or handled only one section (depending on the machine, I saw the problem on a few). After I enabled debug logging it only logged myriads of notes about reading and writing data.json with some statistics („XYZV before, XYZV after”) and later started to put some notes about finding or adding state of individual files (~ 1 such note per second). On the machine I fixed last it remained in such a state for about 2 hours, without noticeable change.
I recovered from that by stopping filebeat, erasing /var/lib/filebeat/registry, and starting filebeat again – then it simply started to work and seems to work properly (of course it republished all old logs according to my age settings and introduced lag caused by old data processing).
I suppose there is sth wrong in the way state is migrated/interpreted/checked after input sections change. This is somewhat unexpected, I didn't think that changing alocation of files to sections impacts state handling.
Tested on filebeat 7.5.1
PS My /var/lib/filebeat/registry/data.json have 3-8MB, if that is of any interest.
I took a look at sources trying to refresh my memory about log entries (actual logs unfortunately rotated after I reset). IIRC I saw in huge numbers snippets like
… sth about reading registry …
Registrar states cleaned up. Before: 3176, After: 3716, Pending: 0
Writing registry file: …
(and again and again) and later slowly it started to emit also
New state added for /some/specific/log/name/different/every/time
Uuuuuuppsss. This may be 7.5.1 problem, not configuration change problem. I just restarted one of those filebeats, without changing anything, and looks like I have problem back.
Well, we are after hours so I may try verifying debug logs more calmly.
OK, so this is typical snippet I see repeating ad infinitum :
2020-02-04T20:51:42.595+0100 DEBUG [input] file/states.go:68 New state added for /home/ids_backend_api/APP/log/adb_processor_adm_20200202_021116_18336_pqt-ids-deb8.log
2020-02-04T20:51:42.596+0100 DEBUG [registrar] registrar/registrar.go:346 Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.598+0100 DEBUG [registrar] registrar/registrar.go:411 Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.678+0100 DEBUG [registrar] registrar/registrar.go:404 Registry file updated. 32448 states written.
2020-02-04T20:51:42.678+0100 DEBUG [registrar] registrar/registrar.go:356 Processing 1 events
2020-02-04T20:51:42.678+0100 DEBUG [registrar] registrar/registrar.go:326 Registrar state updates processed. Count: 1
2020-02-04T20:51:42.679+0100 DEBUG [acker] beater/acker.go:64 stateful ack {"count": 1}
2020-02-04T20:51:42.679+0100 DEBUG [publisher] pipeline/client.go:220 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:1855605-65026 Finished:true Fileinfo:<nil> Source:/home/ids_backend_api/APP/log/adb_processor_adm_20200202_021116_18336_pqt-ids-deb8.log Offset:160240 Timestamp:2020-02-04 20:47:25.96242759 +0100 CET TTL:-1ns Type:log Meta:map[] FileStateOS:1855605-65026} TimeSeries:false}
2020-02-04T20:51:42.679+0100 DEBUG [input] file/states.go:68 New state added for /home/ids_backend_api/APP/log/adb_processor_adm_20200203_003539_21527_pqt-ids-deb8.log
2020-02-04T20:51:42.679+0100 DEBUG [registrar] registrar/registrar.go:346 Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.681+0100 DEBUG [registrar] registrar/registrar.go:411 Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.763+0100 DEBUG [registrar] registrar/registrar.go:404 Registry file updated. 32448 states written.
2020-02-04T20:51:42.763+0100 DEBUG [registrar] registrar/registrar.go:356 Processing 1 events
2020-02-04T20:51:42.763+0100 DEBUG [registrar] registrar/registrar.go:326 Registrar state updates processed. Count: 1
2020-02-04T20:51:42.764+0100 DEBUG [registrar] registrar/registrar.go:346 Registrar states cleaned up. Before: 32448, After: 32448, Pending: 0
2020-02-04T20:51:42.767+0100 DEBUG [acker] beater/acker.go:64 stateful ack {"count": 1}
2020-02-04T20:51:42.772+0100 DEBUG [registrar] registrar/registrar.go:411 Write registry file: /var/lib/filebeat/registry/filebeat/data.json (32448)
2020-02-04T20:51:42.772+0100 DEBUG [publisher] pipeline/client.go:220 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:1894543-65026 Finished:true Fileinfo:<nil> Source:/home/ids_backend_api/APP/log/adb_processor_adm_20200203_003539_21527_pqt-ids-deb8.log Offset:161959 Timestamp:2020-02-04 20:47:26.0981662 +0100 CET TTL:-1ns Type:log Meta:map[] FileStateOS:1894543-65026} TimeSeries:false}
Timing here (~0.1s per such section) is rather happy, later I see it slowing down a bit. Still, even if it kept 10files/s ratio, with my 70.000 files on the machine I don't expect this restart to finish soon.
Nothing seems going upstream.
I tried downgrading to 7.4.1 and 7.1.1 and looks like I still have a problem. Mayhaps I am unlucky having particularly many files today (some app crashed a lot and generated many of them) but IIRC I happend to restart some 7.* version without issues (surely I did some upgrades). But it was with previous configuration.
Going to try bisecting the latter a bit.
OK, I am not sure whether this is the only factor, but it looks like I have particularly many files today (due to crashes of some app), and this is the main reason for the problems – not my config change (which triggered the problem because I restarted filebeat to apply it) and not filebeat version.
I accept that it takes time to process them, but it is problematic that filebeat doesn't start handling other files and even other inputs in such a case (no harvesters start, at least for very noticeable time).
Sole cost seems also somewhat problematic, 0.1-0.2s for single file, with files processed linearly, is a lot. This doesn't seem directly related to the filesystem cost, directory uses ext4 dir_index. As filebeat has also high CPU usage in this phase, and as I straced it to repeatably open an write data.json, it seems to me that the slowdown may be caused by repeatable read/parse/update/serialize/write cycle on data.json. In case of multiple inputs there also can be some contention for locks on data.json or sth similar. So my wild bet is that huge improvement could be obtained by checkpointing data.json from time to time instead of doing it after every single file what seems to have place.
I'm running into the same issue on 7.4.0 and 7.5.2. I have about 500 thousand files about 1mb large each. Filebeat can take days to start outputting events to Logstash after a restart. My registry file is about 20 Mb large.
Some things I've noticed:
- If I delete my registry file, Filebeat will start sending events immediately
- The amount of time it takes for Filebeat to start sending events seems to grow as the size of my registry file increased. For example, when my registry file was only 200 Kb the restart would take a few minutes.
- When looking at usage statistics, the number of open files by the harvester seems to stay at 0 for days (I see this increase to the number of files I'm ingesting once filebeat is outputting events)
"harvester": {
"open_files": 0,
"running": 0
}
CPU usage by filebeat is also suspiciously high. I'm not sure what the bottle neck would be. Potentially it's the CPU on my machine:
- Elasticsearch/logstash has tons of headroom (even when filebeat is sending events before the restart is issued)
- Not hitting RAM limits on the machine (8 GB is still free)
- Filebeat is using 100% of the CPU on one of the cores of the machine
- Disk IO is minimal (about 11% usage)
Things I've Tried:
- Purging my filebeat registry file (filebeat starts sending events quickly afterwards but then I have to re-send everything)
- Tuning my filebeat inputs. This should only leave about 100 files being active at once.
- type: log
harvester_limit: 512
enabled: true
scan_frequency: 120
close_inactive: 1m
clean_inactive: 750h
clean_removed: true
One more thing to add: My log path has two wildcard in it. Mainly because I'm parsing a folder of n subfolders of n text files. So:
paths:
- /path_to_subfolders/*/*.txt
Unfortunately, the registry file has its limitations in Filebeat. It is just a plain text file with JSON data in it. So everytime the state needs to be updated (e.g. events were ACKed or dropped), the complete registry is flushed to disk. The flush consist of the following steps:
- creating new temporary file
- serialize millions of entries to JSON
- fsync the file
- close the FD
- move the file
- fsync the folder where the registry is located
This is a slow process and it can be subject to backpressure.
There are a few workarounds:
- stop Filebeat and clean the registry manually or by an external script (then restart Filebeat)
- decrease the intervals configured in
clean_*settings to make Filebeat remove entries from the registry - set
registry.flushto a higher interval, so Filebeat flushes the state info less frequently (https://www.elastic.co/guide/en/beats/filebeat/current/configuration-general-options.html#_registry_flush)
Also, next time please post your question on Discuss. Github issues are reserved for verified bugs.
Well. You have fairly simple test case: start filebeat, create 50.000 files in log directory, call filebeat restart. After restart filebeat will spend hours doing 100% CPU before it starts actually doing anything.
I'd say the current registry design is buggy, at least in cases where it's possible to have many log files. It's likely that switching from single json into something smarter, what wouldn't require rewriting whole registry after every operation (.sqlite for example, or even state batched into many files with - say - 100 entries per single file) would resolve the issue.
I am going to look at flush settings.
There is an ongoing effort to rewrite the registry. Hopefully, when it is merged, your situtation will improve: https://github.com/elastic/beats/pull/12908
It preliminarily seems that
filebeat.registry.flush: 60s
mitigates the problem, although I am yet to see behaviour in the „worst” cases.
It seems to me that current default of 0 is - considering current costly implementation - wrong, and it would be more sensible to default to sth else.
Or, at the very least, I'd suggest that /etc/filebeat/filebeat.yml from packaged filebeats contained this setting (even if commented-out) with some remark about it's big performance impact.
PS #12908 makes me afraid for now („All in all there is a performance degragation due to this change. This becomes most noticable in increased CPU usage.”). I understand this is temporary step, but maybe during this temporary state suggestions above make even more sense?
I'm still seeing this same issue after making the tweaks suggested above. I've opened this topic in Discuss to continue the topic. https://discuss.elastic.co/t/filebeat-extremely-slow-startup-for-large-number-of-files/240746
Thanks
I have the same issue and it won't get fixed until I remove registry, which is undesired because leads to lots of duplications. A small portion of log file is as follows:
2021-01-31T05:03:36.641+0330 INFO instance/beat.go:455 filebeat start running.
2021-01-31T05:03:36.641+0330 INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2021-01-31T05:03:36.641+0330 DEBUG [service] service/service_windows.go:93 Windows is interactive: true
2021-01-31T05:03:36.641+0330 DEBUG [test] registrar/migrate.go:304 isFile(C:\ProgramData\Elastic\Beats\filebeat\data\registry) -> false
2021-01-31T05:03:36.641+0330 DEBUG [test] registrar/migrate.go:304 isFile() -> false
2021-01-31T05:03:36.641+0330 DEBUG [test] registrar/migrate.go:297 isDir(C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat) -> true
2021-01-31T05:03:36.641+0330 DEBUG [test] registrar/migrate.go:304 isFile(C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat\meta.json) -> true
2021-01-31T05:03:36.642+0330 DEBUG [registrar] registrar/migrate.go:84 Registry type '1' found
2021-01-31T05:03:37.251+0330 INFO memlog/store.go:119 Loading data file of 'C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat' succeeded. Active transaction id=862418635
2021-01-31T05:03:38.325+0330 INFO memlog/store.go:124 Finished loading transaction log file for 'C:\ProgramData\Elastic\Beats\filebeat\data\registry\filebeat'. Active transaction id=862438999
2021-01-31T05:03:38.326+0330 WARN beater/filebeat.go:381 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2021-01-31T05:03:38.643+0330 INFO [registrar] registrar/registrar.go:109 States Loaded from registrar: 16798
2021-01-31T05:03:38.644+0330 INFO [crawler] beater/crawler.go:71 Loading Inputs: 10
2021-01-31T05:03:38.644+0330 DEBUG [registrar] registrar/registrar.go:140 Starting Registrar
2021-01-31T05:03:38.655+0330 WARN [cfgwarn] log/config.go:184 EXPERIMENTAL: scan_sort is used.
2021-01-31T05:03:38.655+0330 DEBUG [input] log/config.go:207 recursive glob enabled
2021-01-31T05:03:38.656+0330 WARN [cfgwarn] log/config.go:184 EXPERIMENTAL: scan_sort is used.
2021-01-31T05:03:38.656+0330 DEBUG [input] log/input.go:169 exclude_files: []. Number of states: 16798
2021-01-31T05:03:38.656+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-26\13__.txt
2021-01-31T05:03:38.656+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:38.656+0330 DEBUG [registrar] registrar/registrar.go:264 Processing 1 events
2021-01-31T05:03:38.657+0330 DEBUG [registrar] registrar/registrar.go:231 Registrar state updates processed. Count: 1
2021-01-31T05:03:38.656+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::4849664-152171-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-26\13__.txt Offset:29948 Timestamp:2021-01-31 04:57:20.5605686 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:4849664-152171-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:38.657+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-27\18__.txt
2021-01-31T05:03:38.657+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:38.657+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::5111808-156889-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-27\18__.txt Offset:18932 Timestamp:2021-01-31 04:57:18.0947681 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:5111808-156889-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:38.657+0330 DEBUG [registrar] registrar/registrar.go:254 Registrar states cleaned up. Before: 16798, After: 16798, Pending: 1
2021-01-31T05:03:38.658+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-23\15__.txt
2021-01-31T05:03:38.658+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:38.666+0330 DEBUG [registrar] registrar/registrar.go:201 Registry file updated. 16798 active states.
2021-01-31T05:03:39.981+0330 DEBUG [registrar] registrar/registrar.go:264 Processing 1 events
2021-01-31T05:03:39.981+0330 DEBUG [registrar] registrar/registrar.go:231 Registrar state updates processed. Count: 1
2021-01-31T05:03:39.981+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::13172736-514469-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-23\15__.txt Offset:18832 Timestamp:2021-01-31 04:57:07.2302427 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:13172736-514469-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:39.981+0330 DEBUG [registrar] registrar/registrar.go:254 Registrar states cleaned up. Before: 16798, After: 16798, Pending: 2
2021-01-31T05:03:39.983+0330 DEBUG [registrar] registrar/registrar.go:201 Registry file updated. 16798 active states.
2021-01-31T05:03:39.983+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-24\13__.txt
2021-01-31T05:03:39.983+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:40.852+0330 DEBUG [registrar] registrar/registrar.go:264 Processing 1 events
2021-01-31T05:03:40.852+0330 DEBUG [registrar] registrar/registrar.go:231 Registrar state updates processed. Count: 1
2021-01-31T05:03:40.853+0330 DEBUG [registrar] registrar/registrar.go:254 Registrar states cleaned up. Before: 16798, After: 16798, Pending: 3
2021-01-31T05:03:40.858+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::1966080-516576-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-24\13__.txt Offset:18732 Timestamp:2021-01-31 04:57:04.8268518 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:1966080-516576-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:40.860+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-22\20__.txt
2021-01-31T05:03:40.860+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:40.861+0330 DEBUG [registrar] registrar/registrar.go:201 Registry file updated. 16798 active states.
2021-01-31T05:03:42.192+0330 DEBUG [registrar] registrar/registrar.go:264 Processing 1 events
2021-01-31T05:03:42.192+0330 DEBUG [registrar] registrar/registrar.go:231 Registrar state updates processed. Count: 1
2021-01-31T05:03:42.193+0330 DEBUG [registrar] registrar/registrar.go:254 Registrar states cleaned up. Before: 16798, After: 16798, Pending: 4
2021-01-31T05:03:42.194+0330 DEBUG [registrar] registrar/registrar.go:201 Registry file updated. 16798 active states.
2021-01-31T05:03:42.424+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::18350080-512223-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-22\20__.txt Offset:21908 Timestamp:2021-01-31 04:56:49.6360456 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:18350080-512223-1524026271 IdentifierName:native} TimeSeries:false}
2021-01-31T05:03:42.430+0330 DEBUG [input] file/states.go:68 New state added for C:\Logs\FileService\2021-01-26\15__.txt
2021-01-31T05:03:42.430+0330 DEBUG [acker] beater/acker.go:59 stateful ack {"count": 1}
2021-01-31T05:03:43.634+0330 DEBUG [registrar] registrar/registrar.go:264 Processing 1 events
2021-01-31T05:03:43.634+0330 DEBUG [registrar] registrar/registrar.go:231 Registrar state updates processed. Count: 1
2021-01-31T05:03:43.634+0330 DEBUG [registrar] registrar/registrar.go:254 Registrar states cleaned up. Before: 16798, After: 16798, Pending: 5
2021-01-31T05:03:43.650+0330 DEBUG [registrar] registrar/registrar.go:201 Registry file updated. 16798 active states.
2021-01-31T05:03:43.691+0330 DEBUG [publisher] pipeline/client.go:231 Pipeline client receives callback 'onFilteredOut' for event: {Timestamp:0001-01-01 00:00:00 +0000 UTC Meta:null Fields:null Private:{Id:native::25886720-152248-1524026271 PrevId: Finished:true Fileinfo:<nil> Source:C:\Logs\FileService\2021-01-26\15__.txt Offset:16864 Timestamp:2021-01-31 04:56:54.2161996 +0330 +0330 TTL:241h0m0s Type:log Meta:map[] FileStateOS:25886720-152248-1524026271 IdentifierName:native} TimeSeries:false}
Hi,
I have filebeat-7.12.1 on windows-2016, which is trying to process 411336 items "D:\Logs\ProjectA-METRICS***.json" with varied file *.json sizes. Max 280 MB and Min 7.5KB and writing to AWS OpenSearch Index.
2022-07-18T09:05:33.324+0530 INFO instance/beat.go:660 Home path: [C:\Program Files\Elastic\Beats\7.12.1\filebeat] Config path: [C:\ProgramData\Elastic\Beats\filebeat] Data path: [C:\ProgramData\Elastic\Beats\filebeat\data] Logs path: [C:\ProgramData\Elastic\Beats\filebeat\logs] 2022-07-18T09:05:33.338+0530 DEBUG [beat] instance/beat.go:712 Beat metadata path: C:\ProgramData\Elastic\Beats\filebeat\data\meta.json 2022-07-18T09:05:33.339+0530 INFO instance/beat.go:668 Beat ID: 5a3fd1ad-71a0-428d-9341-9b93aa45eb4e 2022-07-18T09:05:33.339+0530 DEBUG [seccomp] seccomp/seccomp.go:96 Syscall filtering is only supported on Linux 2022-07-18T09:05:33.339+0530 INFO [beat] instance/beat.go:996 Beat info {"system_info": {"beat": {"path": {"config": "C:\ProgramData\Elastic\Beats\filebeat", "data": "C:\ProgramData\Elastic\Beats\filebeat\data", "home": "C:\Program Files\Elastic\Beats\7.12.1\filebeat", "logs": "C:\ProgramData\Elastic\Beats\filebeat\logs"}, "type": "filebeat", "uuid": "5a3fd1ad-71a0-428d-9341-9b93aa45eb4e"}}} 2022-07-18T09:05:33.340+0530 INFO [beat] instance/beat.go:1005 Build info {"system_info": {"build": {"commit": "651a2ad1225f3d4420a22eba847de385b71f711d", "libbeat": "7.12.1", "time": "2021-04-20T19:58:27.000Z", "version": "7.12.1"}}} 2022-07-18T09:05:33.340+0530 INFO [beat] instance/beat.go:1008 Go runtime info {"system_info": {"go": {"os":"windows","arch":"amd64","max_procs":10,"version":"go1.15.9"}}} 2022-07-18T09:05:33.342+0530 INFO [beat] instance/beat.go:1012 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2022-07-06T00:38:55.48+05:30","name":"INTAWPGTUI051","ip":["10.168.148.50/24","::1/128","127.0.0.1/8"],"kernel_version":"10.0.14393.4467 (rs1_release.210604-1844)","mac":["00:50:56:87:ad:d8"],"os":{"type":"windows","family":"windows","platform":"windows","name":"Windows Server 2016 Standard","version":"10.0","major":10,"minor":0,"patch":0,"build":"14393.4467"},"timezone":"IST","timezone_offset_sec":19800,"id":"a4f380e9-055f-4b4c-bdc9-b1824c2dcabf"}}} 2022-07-18T09:05:33.343+0530 INFO [beat] instance/beat.go:1041 Process info {"system_info": {"process": {"cwd": "C:\Windows\system32", "exe": "C:\Program Files\Elastic\Beats\7.12.1\filebeat\filebeat.exe", "name": "filebeat.exe", "pid": 20412, "ppid": 724, "start_time": "2022-07-18T09:05:33.257+0530"}}} 2022-07-18T09:05:33.343+0530 INFO instance/beat.go:304 Setup Beat: filebeat; Version: 7.12.1 2022-07-18T09:05:33.343+0530 DEBUG [beat] instance/beat.go:330 Initializing output plugins 2022-07-18T09:05:33.343+0530 INFO eslegclient/connection.go:99 elasticsearch url: https://elkprod132-.ap-south-1.es.amazonaws.com:443 2022-07-18T09:05:33.343+0530 DEBUG [publisher] pipeline/consumer.go:148 start pipeline event consumer
I am using filebeat input section below, tried different values and getting the panic error again and again.
filebeat.inputs:
- type: filestream
harvester_limit: 512
enabled: true
close_inactive: 1m
clean_inactive: 4h
clean_removed: true
paths:
- D:\Logs\XXX_CONTROLLER_XXX_METRICS***.json scan_frequency: 120 json.keys_under_root: true json.add_error_key: true processors:
- decode_json_fields: fields: - message - correlationId - Exception process_array: false max_depth: 10 target: '' overwrite_keys: true
=== Error==== panic: reflect.Select: too many cases (max 65536)
goroutine 130 [running]: reflect.Select(0xc36750a000, 0x10001, 0x1036d, 0x1, 0x1e373c0, 0xc02d19b0e0, 0x16, 0x9801) /usr/local/go/src/reflect/value.go:2181 +0x9a9 github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).runSignalPropagation(0xc000736140) /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/pipeline.go:365 +0x1d4 created by github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*Pipeline).registerSignalPropagation.func1 /go/src/github.com/elastic/beats/libbeat/publisher/pipeline/pipeline.go:350 +0x85
Reopening as there seems to be issues still in Filebeat 8.3 https://discuss.elastic.co/t/fillebeat-keeps-restarting-in-kuberneates/309702
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
I experienced a very similar issue with filebeat v8.8.2 making little to no progress reading and shipping log files. Using lsof -o showed that the file descriptor offsets for the ~100 open files was not updating for at least 20 of the files despite the files being actively written (i.e. not closed due to inactivity). The worst case was a ~250MB web server container HTTP access log file of which filebeat had only read ~4MB and hadn't made any progress in about an hour.
Per @kvch's comment I added filebeat.registry.flush: 60s to the filebeat.yml and this appears to have resolved the issue for my systems.
For reference, the filebeat.yml in use is very lean:
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
add_resource_metadata:
namespace:
enabled: false
deployment: false
cronjob: false
templates:
- condition:
and:
- regexp:
kubernetes.namespace: "^some-prefix.*"
- not:
equals:
kubernetes.labels.some-label: "some-value"
config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}.log
tail_files: true
#filebeat.registry.flush: 60s # adding this fixed throughput
output.elasticsearch:
index: default
hosts: ["http://localhost:9200"]
allow_older_versions: true
setup:
ilm.enabled: false
template.enabled: false
#https://github.com/elastic/beats/issues/16076#issuecomment-1186743957
I'm also experiencing this in 8.13.1 with only ~17k files created hourly.
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!