fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Fleet stopped working after sometime. Seeing tons of "too many open files" error

Open daweizhang123 opened this issue 3 years ago • 6 comments

Fleet version: (head to the "My account" page in the Fleet UI or run fleetctl --version) fleet version 4.12.0

Operating system: (e.g. macOS 11.2.3) Debian GNU/Linux 10

Web browser: (e.g. Chrome 88.0.4324) N/A


🧑‍💻  Expected behavior

Fleet is not able to start.

💥  Actual behavior

Fleet is not able to start.

More info

Sep 12 01:03:35 n121-011-134 fleet[32592]: {"component":"http","err":"error writing result logs: writing log: timestamp: 2022-09-12T01:00:12Z: can't open new logfile: open /var/log/fleet/result.log: too many open files","ip_addr":"10.121.27.20","level":"error","method":"POST","took":"4.526744385s","ts":"2022-09-12T01:00:12.16973567Z","uri":"/api/v1/osquery/log","x_for_ip_addr":"10.121.27.20"}

Seeing tons of error like this. But unsure this is the root cause.

daweizhang123 avatar Sep 12 '22 01:09 daweizhang123

@daweizhang123 https://fleetdm.com/docs/deploying/faq#what-do-i-do-about-too-many-open-files-errors

smaddock avatar Sep 12 '22 16:09 smaddock

@smaddock Thanks for the advice. I have tried that and now it's able to start

daweizhang123 avatar Sep 12 '22 17:09 daweizhang123

hm, sorry, the fleet went down again after the server running for a few minutes.

Server just crashed, didn't see any errors in logs.

My guess is one very resource-consuming scheduled query was running repeatedly. Even I deleted that scheduled query manually, fleet was still trying to execute them. So is there a recommended way to clear these cached queries?

daweizhang123 avatar Sep 12 '22 17:09 daweizhang123

Seeing errors in log like this. Is this the root cause?

Sep 12 18:28:44 n121-008-225 fleet[73464]: {"component":"http","err":"authentication error: find host: timestamp: 2022-09-12T18:23:45Z: context canceled","level":"info","path":"/api/v1/osquery/log","ts":"2022-09-12T18:23:45.819699397Z"}

daweizhang123 avatar Sep 12 '22 18:09 daweizhang123

root@n121-008-225:~# systemctl status fleet ● fleet.service - Fleet Loaded: loaded (/etc/systemd/system/fleet.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-09-12 18:46:11 UTC; 4min 57s ago Main PID: 26381 (fleet) Tasks: 377 (limit: 39321) Memory: 799.1G CGroup: /system.slice/fleet.service

Very strange, the fleet is consuming 800G Memory. In the fleet, there is no scheduled query or ongoing query running.

daweizhang123 avatar Sep 12 '22 18:09 daweizhang123

You are running a quite old version of Fleet. Can you please upgrade to 4.19.1 or 4.20.0 and let us know whether the issue persists?

zwass avatar Sep 13 '22 03:09 zwass

@daweizhang123 Were you able to update to the latest version of Fleet and if so are you still encountering the issue you originally described?

xpkoala avatar Oct 20 '22 07:10 xpkoala

Closing this for now as we don't have the information we need to continue investigating. @daweizhang123 please comment and reopen if you upgrade and continue to see issues.

zwass avatar Nov 21 '22 20:11 zwass