fleet icon indicating copy to clipboard operation
fleet copied to clipboard

osqueryd runs out of memory when a refetch is triggered from "My Device" page

Open roperzh opened this issue 2 years ago • 13 comments

Fleet version: Fleet Desktop v0.0.3

Operating system: macOS 12.4 (21F79), MacBook Pro (13-inch, M1, 2020)


🧑‍💻  Expected behavior

After triggering a re-fetch from the web UI, Fleet Desktop shouldn't restart.

💥  Actual behavior

As seen in the GIF below, as soon as the refetch is completed, the system tray disappears (and appears back again after a few seconds)

2022-07-08 11 30 58

This seems to be happening because osqueryd runs out of memory, as /var/log/orbit/orbit.stderr.log contains:

W0708 11:46:23.758304 1800450048 watcher.cpp:397] osqueryd worker (33720) stopping: Memory limits exceeded: 262242304
2022-07-08T11:46:25-03:00 ERR unexpected exit error="deregistering extension: Connection not open"
2022-07-08T11:46:26-03:00 INF 2022-07-08T11:46:26-03:00 INF 2022-07-08T11:46:26-03:00 INF 2022-07-08T11:46:27-03:00 INF opening path="/opt/orbit/bin/desktop/macos/edge/Fleet Desktop.app"

roperzh avatar Jul 08 '22 13:07 roperzh

Wasn't able to reproduce on my end.

Please check orbit logs: /var/log/orbit/orbit.stderr.log.

lucasmrod avatar Jul 08 '22 14:07 lucasmrod

sorry, I created the issue in a rush. I re-fetched again and pasted the relevant portion of the logs in the issue description. Seems like osqueryd runs out of memory (we might have solved this recently)

roperzh avatar Jul 08 '22 14:07 roperzh

W0708 11:46:23.758304 1800450048 watcher.cpp:397] osqueryd worker (33720) stopping: Memory limits exceeded: 262242304

Curious as to which query might be causing that... (Hopefully https://github.com/osquery/osquery/issues/7658 will help :)

lucasmrod avatar Jul 08 '22 15:07 lucasmrod

@roperzh Do you have a reliable way to reproduce this issue? I have not been able to reproduce it locally. Is this something that needs more investigation before fixing?

xpkoala avatar Aug 12 '22 19:08 xpkoala

@xpkoala I just tried again a couple of times and I can consistently reproduce in my machine by going to https://dogfood.fleetdm.com/device/<token_redacted> and clicking on "Refetch"

image

After a while the tray icon just disappears for a bit (while Fleet desktop restarts). Here are the details of my machine, but you can query more if interested from Dogfood.

image

roperzh avatar Aug 12 '22 20:08 roperzh

@roperzh planned next week osquery team will upgrade osquery to 5.5.1. It will have some more logging which will give us more information. We think it is some query being sent up. Can you try it again once 5.5.1 is on your machine?

Sharvil: Hunch is it's uuid on system_info

zhumo avatar Aug 18 '22 16:08 zhumo

5.5.1 has https://github.com/osquery/osquery/pull/7675, so Fleet logs should provide us more info about which query is the problematic one.

lucasmrod avatar Aug 18 '22 16:08 lucasmrod

@roperzh @lucasmrod can this be retested now that 5.5.1 has been out?

zhumo avatar Sep 02 '22 20:09 zhumo

@roperzh Could you retest with fleetctl package --type=pkg [...] --osqueryd-channel=edge? (osqueryd in edge is 5.5.1)

lucasmrod avatar Sep 02 '22 20:09 lucasmrod

will do, right now I'm installing/uninstalling local builds of orbit & friends to test token rotation. I'll add this to my TODO and revisit once my env is stable again.

roperzh avatar Sep 02 '22 20:09 roperzh

W0708 11:46:23.758304 1800450048 watcher.cpp:397] osqueryd worker (33720) stopping: Memory limits exceeded: 262242304

My guess is fleet_detail_query_software_macos, but let's wait for osquery 5.5.1 to tell us which is the culprit query :)

lucasmrod avatar Sep 02 '22 20:09 lucasmrod

@roperzh can you please retest when you have a chance? Thank you!

zwass avatar Sep 20 '22 18:09 zwass

so sorry for the delay folks, so: I ran the cleanup script, generated a new installer, and double checked I have the right version:

~ $ /opt/orbit/bin/osqueryd/macos-app/edge/osquery.app/Contents/MacOS/osqueryd --version
osqueryd version 5.5.1

Unfortunately, I can't reproduce anymore. Wild guess: did any of the policy queries we use in dogfood change? I will keep the new version running for a few hours and try again.

roperzh avatar Sep 20 '22 19:09 roperzh

following up, I tried again today and still couldn't repro

roperzh avatar Oct 12 '22 12:10 roperzh

IMO we should close, and reopen if we managed to reproduce again.

lucasmrod avatar Oct 12 '22 12:10 lucasmrod