fleet icon indicating copy to clipboard operation
fleet copied to clipboard

fleet running in abnormal state

Open eastbook opened this issue 3 years ago • 1 comments

fleet : 4.20.0 image

fleet running in bad state, you can see from the above screenshot the traffic our lb for fleet is pretty high.

and fleet server cpu usage image

also fleet consuming high mem

systemctl status fleet.service ● fleet.service - Fleet Loaded: loaded (/etc/systemd/system/fleet.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2022-09-21 17:42:23 UTC; 5h 20min ago Main PID: 3090473 (fleet) Tasks: 19 (limit: 4915) Memory: 3.9G CPU: 11h 21min 13.035s CGroup: /system.slice/fleet.service └─3090473 /usr/bin/fleet serve --mysql_address=127.0.0.1:3306 --mysql_database=fleet --mysql_username=root --mysql_password=admin --redis_address=127.0.0.1:6379 --redis_password=fleetpass --fil

err in log of fleet Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/config","ts":"2022-09-21T23:03:08.480797827Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/config","ts":"2022-09-21T23:03:08.481011716Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.73.56","level":"error","method":"POST","took":"28.666015098s","ts":"2022-09-21T23:03:08.481223072Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.73.56"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.42.98","level":"error","method":"POST","took":"29.773316374s","ts":"2022-09-21T23:03:08.482628392Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.42.98"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T23:03:08.485681384Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.31.61","level":"error","method":"POST","took":"24.81762093s","ts":"2022-09-21T23:03:08.489245127Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.31.61"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.29.42","level":"error","method":"POST","took":"28.550266401s","ts":"2022-09-21T23:03:08.496053427Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.29.42"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve policy queries: selecting policies for host: context canceled","ip_addr":"10.121.35.121","level":"error","method":"POST","took":"15.98712566s","ts":"2022-09-21T23:03:08.498432891Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"10.121.35.121"}

but we only have 20k hosts, plz help to advice.

eastbook avatar Sep 21 '22 23:09 eastbook

got some suggestion from fleet slack channel saying the vulnerabilities database is not setup caused this err, but it is still like this after i set it up.

eastbook avatar Sep 21 '22 23:09 eastbook

Hi @eastbook I'm sorry for the delay in getting you a response.

@michalnicp @roperzh Have either of you encountered any such issue when running load test instances? Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms this stands out too me, but I don't have a good means of attempting to reproduce this issue.

xpkoala avatar Oct 07 '22 17:10 xpkoala

@eastbook for your "too many open files" errors, please see the FAQ: https://fleetdm.com/docs/deploying/faq#what-do-i-do-about-too-many-open-files-errors

For the context cancelled, this seems likely related to your DB being overloaded. Can you get CPU utilization metrics from your DB?

zwass avatar Nov 21 '22 20:11 zwass

@eastbook I'm going to close this ticket for now. If you are still encountering issues please feel free to re-open this issue with any new information you can provide.

xpkoala avatar Dec 16 '22 18:12 xpkoala