fleet
fleet copied to clipboard
Vulnerabilities count mismatch for macOS 15.4.1
Fleet version: 4.68 (unconfirmed)
Web browser and operating system: Google Chrome (version unknown)
💥 Actual behavior
Mismatch in vulnerability counts for macOS 15.4.1 between dogfood (fleet premium) and fleet free
🧑💻 Steps to reproduce
TODO
reference here: https://fleetdm.slack.com/archives/C019WG4GH0A/p1748009270163509
🕯️ More info (optional)
N/A
This doesn't appear specific to 15.4.1 I have seen similar things across fleet instances. For example:
Comparing the 15.3 hosts the main differences I see are virtual machines vs physical hardware, but if the vulnerabilities have to do with the OS then I'm not sure why hardware would matter. These are all ARM machines so it's not a case of it being counted twice for Intel + ARM.
@mostlikelee based on the conversation in slack, I've put the 'to fix' as an investigation as to what might be going on. I'll send this to estimation.
Planning poker: https://fleetdm.slack.com/archives/C08RXDH5LHZ/p1750682841206779
timebox 2
Adding the repro label back onto this as it's been long enough that the mentioned Render instance is no longer online, and that version of macOS is no longer listed in Dogfood.
Can't use QAWolf envs to repro this as vuln processing is all over the map on those. Enrolling a host on Fleet Free's Dogfood would work here; right now that env only has Linux hosts attached. Just need a macOS host on a vulnerable version that exists on both Dogfood and Dogfood-Free, where vuln counts mismatch between them.
Never mind, got enough info to repro. Moving this back.
So, we have #27061 making this bug a bit harder to troubleshoot, but it seems like there's a reliability issue somewhere for associating OS vulnerabilities to OS versions. https://dogfood.fleetdm.com/software/vulnerabilities/CVE-2025-31241 for example should apply to macOS 15.2 and 15.1, and that link was made in @jmwatts's local environment, but not in Dogfood. There are a handful of other vulnerabilities where we're missing a link, and it feels like the quantity of vulns missed isn't necessarily consistent between vulns runs.
Confirmed that this is not a Free vs. Premium issue.
Issue here was that my fix for #28368 triggered a regression here. In order to avoid test flakiness I avoided bumping updated_at in the database when pieces of the vulnerability hadn't changed, but doing so meant that the stale cleanup step of vulnerability processing would delete an arbitrary number of OS vulnerabilities that had existed in the previous run. Then the next vulns run would insert them again...and rinse and repeat. This phenomenon is more obvious in environments that are running the vulns cron every hour, and for which the vulns cron takes a non-negligible amount of time.
This would've broken more obviously had #26404 been in place, as we would've seen dramatic flapping on OS vulns on consecurive runs, without those runs needing to be two hours apart.
The fix is to revert to the previous updated_at behavior (bumping updated_at to the current time on each OS vuln insert/on duplicate key update), then revise the return value of the containing function (which only gets used in tests) to correctly indicate when a row was inserted vs. updated (whether or not values changed on that row), and improve tests, including ensuring that updated_at actually gets bumped when necessary.
To QA:
- Run vulns from a clean DB with a host or two enrolled with OSes with vulns
- Check OS vulnerabilities
- Simulate time passing by running
UPDATE operating_system_vulnerabilities SET updated_at = NOW() - INTERVAL 3 HOUR - Run vulns again
- Check OS vulnerabilities
Prior to the fix you'd see ~no vulns in the DB at all at step 5. With the fix you'd see the same number of vulns in steps 2 and 5.
QA Notes
- Run vulns from a clean DB with a host or two enrolled with OSes with vulns
- Check OS vulnerabilities
- Simulate time passing by running UPDATE operating_system_vulnerabilities SET updated_at = NOW() - INTERVAL 3 HOUR
- Run vulns again
- Check OS vulnerabilities
- [x] Confirm the same number of vulns are shown in steps 2 and 5
select * from operating_system_vulnerabilities;
2302 rows in set (0.03 sec)
select * from operating_system_vulnerabilities;
2302 rows in set (0.02 sec)
Also confirmed that host counts match between different fleet servers for identical macOS versions.
Mac's flaws, once hidden, In glass city, now counted. Fleet brings light to truth.