fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Vulnerabilities count mismatch for macOS 15.4.1

Open mostlikelee opened this issue 5 months ago • 1 comments

Fleet version: 4.68 (unconfirmed)

Web browser and operating system: Google Chrome (version unknown)


💥  Actual behavior

Mismatch in vulnerability counts for macOS 15.4.1 between dogfood (fleet premium) and fleet free

Image

🧑‍💻  Steps to reproduce

TODO

reference here: https://fleetdm.slack.com/archives/C019WG4GH0A/p1748009270163509

🕯️ More info (optional)

N/A

mostlikelee avatar Jun 13 '25 14:06 mostlikelee

This doesn't appear specific to 15.4.1 I have seen similar things across fleet instances. For example:

Image

Comparing the 15.3 hosts the main differences I see are virtual machines vs physical hardware, but if the vulnerabilities have to do with the OS then I'm not sure why hardware would matter. These are all ARM machines so it's not a case of it being counted twice for Intel + ARM.

Image

jmwatts avatar Jun 20 '25 16:06 jmwatts

@mostlikelee based on the conversation in slack, I've put the 'to fix' as an investigation as to what might be going on. I'll send this to estimation.

eugkuo avatar Jun 23 '25 12:06 eugkuo

Planning poker: https://fleetdm.slack.com/archives/C08RXDH5LHZ/p1750682841206779

eugkuo avatar Jun 23 '25 12:06 eugkuo

timebox 2

mostlikelee avatar Jun 24 '25 15:06 mostlikelee

Adding the repro label back onto this as it's been long enough that the mentioned Render instance is no longer online, and that version of macOS is no longer listed in Dogfood.

Can't use QAWolf envs to repro this as vuln processing is all over the map on those. Enrolling a host on Fleet Free's Dogfood would work here; right now that env only has Linux hosts attached. Just need a macOS host on a vulnerable version that exists on both Dogfood and Dogfood-Free, where vuln counts mismatch between them.

iansltx avatar Jul 09 '25 20:07 iansltx

Never mind, got enough info to repro. Moving this back.

iansltx avatar Jul 09 '25 20:07 iansltx

So, we have #27061 making this bug a bit harder to troubleshoot, but it seems like there's a reliability issue somewhere for associating OS vulnerabilities to OS versions. https://dogfood.fleetdm.com/software/vulnerabilities/CVE-2025-31241 for example should apply to macOS 15.2 and 15.1, and that link was made in @jmwatts's local environment, but not in Dogfood. There are a handful of other vulnerabilities where we're missing a link, and it feels like the quantity of vulns missed isn't necessarily consistent between vulns runs.

Confirmed that this is not a Free vs. Premium issue.

iansltx avatar Jul 09 '25 22:07 iansltx

Issue here was that my fix for #28368 triggered a regression here. In order to avoid test flakiness I avoided bumping updated_at in the database when pieces of the vulnerability hadn't changed, but doing so meant that the stale cleanup step of vulnerability processing would delete an arbitrary number of OS vulnerabilities that had existed in the previous run. Then the next vulns run would insert them again...and rinse and repeat. This phenomenon is more obvious in environments that are running the vulns cron every hour, and for which the vulns cron takes a non-negligible amount of time.

This would've broken more obviously had #26404 been in place, as we would've seen dramatic flapping on OS vulns on consecurive runs, without those runs needing to be two hours apart.

The fix is to revert to the previous updated_at behavior (bumping updated_at to the current time on each OS vuln insert/on duplicate key update), then revise the return value of the containing function (which only gets used in tests) to correctly indicate when a row was inserted vs. updated (whether or not values changed on that row), and improve tests, including ensuring that updated_at actually gets bumped when necessary.

To QA:

  1. Run vulns from a clean DB with a host or two enrolled with OSes with vulns
  2. Check OS vulnerabilities
  3. Simulate time passing by running UPDATE operating_system_vulnerabilities SET updated_at = NOW() - INTERVAL 3 HOUR
  4. Run vulns again
  5. Check OS vulnerabilities

Prior to the fix you'd see ~no vulns in the DB at all at step 5. With the fix you'd see the same number of vulns in steps 2 and 5.

iansltx avatar Jul 09 '25 23:07 iansltx

QA Notes

  1. Run vulns from a clean DB with a host or two enrolled with OSes with vulns
  2. Check OS vulnerabilities
  3. Simulate time passing by running UPDATE operating_system_vulnerabilities SET updated_at = NOW() - INTERVAL 3 HOUR
  4. Run vulns again
  5. Check OS vulnerabilities
  • [x] Confirm the same number of vulns are shown in steps 2 and 5
select * from operating_system_vulnerabilities;
2302 rows in set (0.03 sec)
select * from operating_system_vulnerabilities;
2302 rows in set (0.02 sec)

Also confirmed that host counts match between different fleet servers for identical macOS versions.

jmwatts avatar Jul 16 '25 15:07 jmwatts

Mac's flaws, once hidden, In glass city, now counted. Fleet brings light to truth.

fleet-release avatar Jul 23 '25 22:07 fleet-release