fleet icon indicating copy to clipboard operation
fleet copied to clipboard

[fleetd] Errors writing log files on CentOS

Open ksatter opened this issue 1 year ago • 11 comments

fleetd version: v1.21.0

Operating system: Centos 7.0, kernel version 3.10.0-1160.108.1.el7.x86_64


💥  Actual behavior

After installing fleetd, osquery logfiles are not updating. In the fleetd logs, customer sees the following errors:

Feb 12 11:30:21  orbit[7462]: Could not create log file: Permission denied
Feb 12 11:30:21 orbit[7462]: COULD NOT CREATE LOGFILE '20240212-113021.7610'!
Feb 12 11:30:21 orbit[7462]: Could not create log file: Permission denied
Feb 12 11:30:21 orbit[7462]: COULD NOT CREATE LOGFILE '20240212-113021.7610'!

This error is also present when starting Orbit:

Feb 12 11:05:38  systemd[1]: [/usr/lib/systemd/system/orbit.service:5] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit'

These are the permissions shown for the logs:

root@<redacted> ~ # ll /opt/orbit/osquery_log/
total 20
lrwxrwxrwx 1 root root   34 Feb 12 11:32 osqueryd.INFO -> osqueryd.INFO.20240212-113223.8090
-rw-r--r-- 1 root root  977 Feb 12 11:05 osqueryd.INFO.20240212-110545.7045
-rw------- 1 root root  834 Feb 12 11:06 osqueryd.INFO.20240212-110601.7329
-rw-r--r-- 1 root root 1120 Feb 12 11:32 osqueryd.INFO.20240212-110608.7610
-rw-r--r-- 1 root root  701 Feb 12 11:32 osqueryd.INFO.20240212-113212.7812
-rw-r--r-- 1 root root  834 Feb 12 11:32 osqueryd.INFO.20240212-113223.8090
-rw-r----- 1 root root    0 Feb 12 11:05 osqueryd.results.log
root@ ~ # ll /var/log/orbit
total 0
root@  ~ # ll /var/log/osquery
total 0

🧑‍💻  Steps to reproduce

  1. TODO
  2. TODO

🕯️ More info (optional)

Additional information and logs are available, please reach out for more information if needed.

ksatter avatar Feb 12 '24 21:02 ksatter

@ksatter I found this ticket with no assigned team. Does it need more reviews or should we assign?

sharon-fdm avatar Feb 13 '24 14:02 sharon-fdm

@sharon-fdm

I haven’t managed to reproduce this one yet due to issues with VMWare.

ksatter avatar Feb 13 '24 14:02 ksatter

I was able to confirm the behavior on CentOS 7.9.2009.

Steps to Reproduce:

  • On a CentOS 7.9.2009 machine
  • Install orbit
  • Open /var/log/messages in an editor of your choosing
  • Search for 'orbit' and find similar log entries as left in the original reply

xpkoala avatar Feb 16 '24 16:02 xpkoala

2 pts to investigate

sharon-fdm avatar Feb 20 '24 19:02 sharon-fdm

Hi folks!

Some updates from my test on CentOS 7.9.2009 and some questions:

After installing fleetd, osquery logfiles are not updating.

@ksatter Double checking ⬆️ on my test it log the COULD NOT CREATE ... errors but the status logs continue to get updated. And the warning only happens once, during install or process restart.

Feb 12 11:05:38 systemd[1]: [/usr/lib/systemd/system/orbit.service:5] Unknown lvalue 'StartLimitIntervalSec' in section 'Unit'

This is probably ok because the systemd version 219 in CentOS is old and doesn't support the setting in the Unit section (Ubuntu 22.04 uses version 249).


PS: I've also reproduced the issue with orbit 1.16.0 and osquery 5.9.1 so it's not a recent bug and has probably been around for some time.

lucasmrod avatar Feb 26 '24 21:02 lucasmrod

@lucasmrod I can try to test this again, but if I recall correctly, logs would continue to update as long as we didn't clear the folder in between installs.

If starting with a blank slate, no log file was created.

ksatter avatar Feb 26 '24 21:02 ksatter

I can try to test this again, but if I recall correctly, logs would continue to update as long as we didn't clear the folder in between installs. If starting with a blank slate, no log file was created.

Once you double check let me know if you were able to reproduce. On my tests I can see the COULD NOT ... errors but the INFO status logs continue to be populated (you can maybe refetch the host to generate more INFO log entries).

PS: I've created the following osquery issue to fix the error logs: https://github.com/osquery/osquery/issues/8286.

lucasmrod avatar Mar 01 '24 14:03 lucasmrod

@ksatter ⬆️

lucasmrod avatar Mar 01 '24 14:03 lucasmrod

The fix for this log issue (https://github.com/osquery/osquery/issues/8286) will be released in osquery 5.12.0.

lucasmrod avatar Mar 04 '24 12:03 lucasmrod

@xpkoala @sabrinabuckets

To test/QA this you will have to manually download osquery 5.12.0 from https://github.com/osquery/osquery/releases/tag/5.12.0 and push it to your local TUF repository. Steps below:

  1. Run ./tools/tuf/test/main.sh as usual (documented in https://github.com/fleetdm/fleet/blob/main/tools/tuf/test/README.md)
  2. Download osquery 5.12.0 for linux from https://github.com/osquery/osquery/releases/tag/5.12.0
  3. Push the downloaded osqueryd to the local TUF ./tools/tuf/test/push_target.sh linux osqueryd ./osqueryd-downloaded-from-github 5.12.0.
  4. Generate an rpm package again (no Fleet Desktop because it's not supported on CentOS 7): ./build/fleetctl package --type=rpm --fleet-url=... --enroll-secret=... --update-roots=$(./build/fleetctl updates roots --path ./test_tuf) --disable-open-folder --update-interval=10s --update-url=http://host.docker.internal:8081 --enable-scripts.
  5. Install rpm package on the CentOS host.
  6. Expected result is that there are no COULD NOT... errors in /var/log/messages.

lucasmrod avatar Mar 04 '24 18:03 lucasmrod

I couldn't think of a haiku this time. (See fleetdm.com logs for more information.)

fleet-release avatar Mar 12 '24 23:03 fleet-release