fleet
fleet copied to clipboard
Upcoming activities stuck in queue
Fleet version: 4.65.0
💥 Actual behavior
I tried to install self-service software about a month ago, and it never installed. After that, any software installs or script runs added to the queue haven't moved.
🧑💻 Steps to reproduce
Not entirely sure how to reproduce this, but here's a host it's happening on: https://dogfood.fleetdm.com/hosts/519
🕯️ More info (optional)
N/A
🛠️ To fix
Product designer: @marko-lisica Understand why are activities stuck and resolve so it doesn't happen anymore.
Hey team! Please add your planning poker estimate with Zenhub @getvictor @ghernandez345 @gillespi314 @mna
I'm escalating to a P1 because from the user's perspective there are items stuck in pending and the workflow is broken.
@marko-lisica @georgekarrv @mna Would y'all please work together to see if we can get a fix in for this next week and included in 4.67.0? Thanks!
I'll see what we can bump out for 3sp but I think we can. Also Cancel is being developed atm so this should at least be less impactful w/ cancel released.
Starting investigation on this, some notes:
- The stuck activity is a VPP app install
- It was enqueued quite some time before the upcoming activities queue was implemented, at
2024-11-12 22:35:08 - The reason why it shows up as "a month ago" is because that's when it got migrated to the new unified queue (on
2025-02-26 21:20:35.545269), but it was stuck since november - Looking at
nano_cert_auth_associations, the timestamp when the VPP app was enqueued has a cert renewal shortly after that date, created on2024-11-19 22:36:29, and the previous cert entry was expired before that (cert not valid after):2024-04-02 - The MDM command is now inactive, and the timestamp of its last update matches exactly the timestamp the cert was renewed:
mysql> select * from nano_enrollment_queue where command_uuid = '84e25621-faa6-4af4-9a2a-67455dcbf448';
+--------------------------------------+--------------------------------------+--------+----------+----------------------------+----------------------------+
| id | command_uuid | active | priority | created_at | updated_at |
+--------------------------------------+--------------------------------------+--------+----------+----------------------------+----------------------------+
| 9BDD6D41-07FA-5E69-823F-1ABA5BFC5174 | 84e25621-faa6-4af4-9a2a-67455dcbf448 | 0 | 0 | 2024-11-12 22:35:08.263017 | 2024-11-19 22:36:29.110874 |
+--------------------------------------+--------------------------------------+--------+----------+----------------------------+----------------------------+
All this to say, it looks very much like this VPP app command (MDM) is stuck due to having been created when the cert was expired, and on cert renewal those old commands got deactivated.
cc @georgekarrv
There was a bunch of mac hosts that got unenrolled from MDM in November 2024 (probably due to the SCEP cert renewal past its expiration):
mysql> select created_at, activity_type, details->'$.host_display_name' from activities where activity_type = 'mdm_unenrolled' and created_at between '2024-11-01' and '2024-12-01' order by created_at desc limit 5;
+----------------------------+----------------+---------------------------------+
| created_at | activity_type | details->'$.host_display_name' |
+----------------------------+----------------+---------------------------------+
| 2024-11-29 00:25:20.923310 | mdm_unenrolled | "Lucas’s MacBook Pro" |
| 2024-11-28 22:55:34.049316 | mdm_unenrolled | "MacBookPro16,2 (C02G90U2ML85)" |
| 2024-11-27 22:05:16.314309 | mdm_unenrolled | "Dale’s MacBook Pro" |
| 2024-11-20 18:46:22.885064 | mdm_unenrolled | "Harrison’s iPhone" |
| 2024-11-20 18:44:09.641478 | mdm_unenrolled | "Rachael’s MacBook Pro" |
+----------------------------+----------------+---------------------------------+
5 rows in set (0.05 sec)
This was some months before the unified queue was implemented, but if that happened now, the VPP app install command would be automatically removed as part of unenrolling from MDM: https://github.com/fleetdm/fleet/pull/26816
And of course, the cert renewal should normally happen before expiration.
So there is nothing to do to fix this issue - this is something that should not happen anymore. And to unblock this specific case, the Cancel Upcoming Activity story https://github.com/fleetdm/fleet/issues/25540 will allow cancelling this one and unblock the rest of the queue (once it gets deployed to dogfood). /cc @rachaelshaw
I will close this issue.
Queues unjam, tasks flow, In cloud city, systems grow, Fleet's progress in tow.
(Setting milestone back to 4.67.0 as it was closed during this sprint)
Now that we can cancel upcoming activities, I cancelled the install that was stuck, and the rest of the pending actions went through after that 👍