fleet icon indicating copy to clipboard operation
fleet copied to clipboard

See macOS hosts that failed DEP profile assignment

Open noahtalerman opened this issue 1 year ago • 8 comments

Goal

User story
As an IT admin,
I want to see macOS hosts that failed automatic enrollment (DEP) profile assignment
so that I don't have to unassign the host from Fleet in Apple Business Manager (ABM), wait 24 hours, and then reassign the host.

Changes

Product

  • [ ] UI changes: Figma link
  • [ ] REST API changes: https://github.com/fleetdm/fleet/pull/16166
  • [ ] Other changes: Increase the retry interval to 1 hr to avoid triggering the Apple API error every time (always fails)
    • Note that during testing, we learned that waiting 24 hours to reassign the profile, after error, let's us successfully assign the profile. We haven't tested the 1 hour cooldown.
    • If 1 hour doesn't work (we still see the error), then increase the retry interval to 24 hours.
  • [ ] Outdated documentation changes: Document debugging instructions for the IT admin if they see this error: Use Fleet to run the profiles show -type enrollment to see which DEP profile is current applied to the host.

Engineering

  • [ ] Database schema migrations: TODO

Context

  • Fleet currently tries to assign the DEP profile every 30 seconds until the profile is assigned.
  • 30 seconds is too fast because the Apple API endpoint errors and fails to assign the DEP profile. Currently, the IT admin has no way of knowing when this error happens.

QA

Risk assessment

Manual testing steps

  1. Transfer host from another MDM to Fleet in ABM
  2. Check Hosts page > MDM status for that host, note if error appears
  3. Validate copy for error message
  4. Attempt to enroll the host and note profile is not assigned
  5. Wait 1 hr and observe if profile has been assigned, repeat until able to successfully enroll.
  6. Transfer host to another MDM in ABM, delete from the Fleet UI
  7. Repeat all steps
  8. If host received profile assignment without error, repeat until error appears.
  9. At least once when error is present, verify that unassign/reassign in ABM is still a viable workaround.

Testing notes

Confirmation

  1. [ ] Engineer (@____): Added comment to user story confirming succesful completion of QA.
  2. [ ] QA (@____): Added comment to user story confirming succesful completion of QA.

noahtalerman avatar Dec 05 '23 21:12 noahtalerman

Noah: I wonder if we can crank the cooldown down to like 1 hr and see if it works (instead of 24)

noahtalerman avatar Dec 11 '23 21:12 noahtalerman

Discussed during product design check-in call (recorded) 2024-01-19:

Mike: Let's make sure to interrogate the error and make sure this is the rate limit error. We want to avoid swallowing other errors and introduce a bug in which we surface a bunch of false positives.

Pseudo code for interrogation.

try { doSomething() } catch (err) { if (err.isRateLimitError) { …} else { throw err; } }

cc @georgekarrv @roperzh

noahtalerman avatar Jan 19 '24 21:01 noahtalerman

@noahtalerman I don't think we have verified if this is a rate limit error! I also don't think we have any way to know from the info that's provided to us.

The only thing we know for sure is that some hosts are failing the profile assignment (we can know which hosts) and we're not surfacing that information anywhere.

roperzh avatar Jan 19 '24 21:01 roperzh

Hey team! Please add your planning poker estimate with Zenhub @gillespi314 @jahzielv @roperzh

georgekarrv avatar Jan 25 '24 19:01 georgekarrv

Context: 2 macOS hosts assigned to my instance via ABM, neither enrolled.

Observations: Error icon is immediately visible next to the MDM status from the moment Fleet picks it up as a Pending ADE host.

Screenshot 2024-03-01 at 1.58.28 PM.png

However, I observed this state on 2 separate hosts. Upon attempting to enroll both, I noted that one in fact had failed to assign ADE profile & proceeded through setup without enrolling. The second enrolled successfully, yet the error remained:

Screenshot 2024-03-01 at 3.13.37 PM.png

It is unclear if there is a timing issue for the error state to eventually clear, however that is moot given that this host should not be displaying an error.

sabrinabuckets avatar Mar 01 '24 20:03 sabrinabuckets

@noahtalerman we need some input from you on this. The issue is this—we are displaying errors on the MDM status for hosts that are already successfully enrolled On (automatic) in addition to hosts that are Pending, so it is currently impossible to tell from a glance if an enrollment profile has failed or not, essentially defeating the purpose of this ticket.

Based on my conversations with Sarah, the error that we are displaying for enrolled hosts is not technically incorrect, because (as is outlined in #17291) we are constantly re-assigning profiles, so it is entirely possible for an enrolled host to have a failed profile assignment.

The proposed solutions are:

  1. Consider this blocked until 17291 gets merged in, and retest to see if the fix also addresses the errors here.
  2. Hide any profile assignment errors for enrolled hosts/only show errors for Pending hosts
  3. Fix the painfully vague tooltip copy (Opinion: we should do this anyway) to explain what the actual error is, thus reducing the "noise" element and making it somewhat useful.

I've moved this to Ready, marked it blocked, and added the :product label to ensure that we get these qiestions answered before releasing.

sabrinabuckets avatar Mar 01 '24 21:03 sabrinabuckets

Consider this blocked until 17291 gets merged in, and retest to see if the fix also addresses the errors here.

@sabrinabuckets and @gillespi314 I think we should go w/ this option. Nice catch BTW

Otherwise we're creating too much noise. The IT admin only cares about the DEP profile failing when they want the DEP profile to change.

Fix the painfully vague tooltip copy

What do y'all think the tooltip should say?

noahtalerman avatar Mar 04 '24 19:03 noahtalerman

@sabrinabuckets I merged a fix for #17291, so I'm moving both issues to awaiting QA

roperzh avatar Mar 06 '24 21:03 roperzh

With #17291 resolved, I am no longer seeing the errors on enrolled hosts.

sabrinabuckets avatar Mar 07 '24 14:03 sabrinabuckets

API docs PR is here: #16166

@Patagonia121, heads up, this customer request was shipped in Fleet 4.47 🔥

noahtalerman avatar Mar 15 '24 16:03 noahtalerman

In Fleet's glass city, Mac hosts that failed now seen, Admins sigh relief.

fleet-release avatar Mar 21 '24 18:03 fleet-release