fleet
fleet copied to clipboard
fleet running in abnormal state
fleet : 4.20.0

fleet running in bad state, you can see from the above screenshot the traffic our lb for fleet is pretty high.
and fleet server cpu usage

also fleet consuming high mem
systemctl status fleet.service ● fleet.service - Fleet Loaded: loaded (/etc/systemd/system/fleet.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2022-09-21 17:42:23 UTC; 5h 20min ago Main PID: 3090473 (fleet) Tasks: 19 (limit: 4915) Memory: 3.9G CPU: 11h 21min 13.035s CGroup: /system.slice/fleet.service └─3090473 /usr/bin/fleet serve --mysql_address=127.0.0.1:3306 --mysql_database=fleet --mysql_username=root --mysql_password=admin --redis_address=127.0.0.1:6379 --redis_password=fleetpass --fil
err in log of fleet Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/config","ts":"2022-09-21T23:03:08.480797827Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/config","ts":"2022-09-21T23:03:08.481011716Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.73.56","level":"error","method":"POST","took":"28.666015098s","ts":"2022-09-21T23:03:08.481223072Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.73.56"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.42.98","level":"error","method":"POST","took":"29.773316374s","ts":"2022-09-21T23:03:08.482628392Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.42.98"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"authentication error: find host: context canceled","level":"info","path":"/api/v1/osquery/distributed/read","ts":"2022-09-21T23:03:08.485681384Z"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.31.61","level":"error","method":"POST","took":"24.81762093s","ts":"2022-09-21T23:03:08.489245127Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.31.61"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: 2022/09/21 23:03:08 http: Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || error in query ingestion || getting app config: selecting app config: context canceled","ingestion-err":"ingest detail query: selecting app config: context canceled","ip_addr":"10.121.29.42","level":"error","method":"POST","took":"28.550266401s","ts":"2022-09-21T23:03:08.496053427Z","uri":"/api/v1/osquery/distributed/write","x_for_ip_addr":"10.121.29.42"} Sep 21 23:03:08 n107-019-021 fleet[3090473]: {"component":"http","err":"retrieve policy queries: selecting policies for host: context canceled","ip_addr":"10.121.35.121","level":"error","method":"POST","took":"15.98712566s","ts":"2022-09-21T23:03:08.498432891Z","uri":"/api/v1/osquery/distributed/read","x_for_ip_addr":"10.121.35.121"}
but we only have 20k hosts, plz help to advice.
got some suggestion from fleet slack channel saying the vulnerabilities database is not setup caused this err, but it is still like this after i set it up.
Hi @eastbook I'm sorry for the delay in getting you a response.
@michalnicp @roperzh Have either of you encountered any such issue when running load test instances?
Accept error: accept tcp [::]:8080: accept4: too many open files; retrying in 5ms this stands out too me, but I don't have a good means of attempting to reproduce this issue.
@eastbook for your "too many open files" errors, please see the FAQ: https://fleetdm.com/docs/deploying/faq#what-do-i-do-about-too-many-open-files-errors
For the context cancelled, this seems likely related to your DB being overloaded. Can you get CPU utilization metrics from your DB?
@eastbook I'm going to close this ticket for now. If you are still encountering issues please feel free to re-open this issue with any new information you can provide.