fleet
fleet copied to clipboard
Adding/removing Apple (macOS,iOS,iPadOS) profiles in the UI takes 15+ seconds
Fleet version: 4.55.0
Web browser and operating system:
💥 Actual behavior
@PezHub: I was able to test this in my current load test env built off 4.64 with 20K hosts:
- adding profiles took 15-25secs (added 15 total)
- deleting a profile took 15-35sec
- I never saw a timeout.
- CPU and memory utilization did spike during the tests but eventually waned
🧑💻 Steps to reproduce
- Add 20k hosts in a load test environment hosts with Apple MDM turned on
- Add at least 10 profiles
- Try to add/remove a profile, observe how the request takes longer than expected or times out
🕯️ More info
From 2024-08-14: principal culprit seems to be:
https://github.com/fleetdm/fleet/blob/16d6757681a1e41f228eec798c4c0f0293b7cf0c/server/datastore/mysql/apple_mdm.go#L1777-L1782
🛠️ To fix
@marko-lisica: Should be tested against 30k hosts (our largest deployment). Test by uploading 15 profiles.
Heads up, this will likely be bigger than 2
Related: https://github.com/fleetdm/fleet/issues/23816
QA Notes:
Completed load testing with slightly improved results but will revisit after the holidays when additional engineers are available to look.
Slack convo here
@dantecatalfamo We may be stepping on each other since I'm trying to fix some issues with my current load test.
I'll let you fix the issue with batch delete. Each batch should run in its own transaction -- running 60k hosts in 1 transaction is a no-go.
My loadtest branch: https://github.com/fleetdm/fleet/pull/24338
As discussed in standup, we're going to hold off on changes related to Victor's comment for the time being and will address it in an upcoming sprint, possibly in conjunction with the unified queue work. See related comment.
Note that Dante's PR, which improves performance adding removing profiles in the UI should be included 4.61.0.
Waiting for new activity queue to be in before revisiting this issue.
- Add a significant number of hosts with MDM turned on (doesn't need to be 30k to see the impact)
- Add at least 10 profiles
- Try to add/remove a profile, observe how the request takes longer than expected or times out
Note that Dante's https://github.com/fleetdm/fleet/pull/23772, which improves performance adding removing profiles in the UI should be included 4.61.0.
@georgekarrv did we still see long load times or timeouts after @dantecatalfamo's improvement? If no, I think we can close this bug.
If we did, please let us know how long the load times were.
Thanks!
Hi @noahtalerman, I was able to test this in my current load test env built off 4.64 with 20K hosts and I can confirm that I'm seeing significant improvements:
- adding profiles took 15-25secs (added 15 total)
- deleting a profile took 15-35sec
- I never saw a timeout.
- CPU and memory utilization did spike during the tests but eventually waned
I think it's ok to close this ticket.
Update: I'm going to create a new ticket to continue making improvements in this area when additional hosts (more than 20K) and profiles (more than 10) are added and also when moving large amounts of hosts from one team to another with each team having at least 10 unique profiles
Update 2: will actually just keep this ticket open and include additional metrics in the comments below
@PezHub thanks! 15-25 seconds is too long. We want all actions in Fleet to take less than 5 seconds.
I updated the issue description with your findings and moved this one to "Ready to estimate"
cc @marko-lisica
understood and good to know that ~5secs is our target! I'll keep that in mind for all future load tests. thanks!
Additional Metrics when Profiles are added/deleted on a team with 20K hosts
Fleet Service
DB Writer
DB Reader
Hey team! Please add your planning poker estimate with Zenhub @getvictor @ghernandez345 @mna
This should sit behind adding mdm commands to the unified queue so we don't duplicate work
- adding profiles took 15-25secs (added 15 total)
- deleting a profile took 15-35sec
- I never saw a timeout.
- CPU and memory utilization did spike during the tests but eventually waned
Hey @PezHub when you get the chance, can you please run the same tests with 6.1k hosts? I assigned the bug to you.
Is it over 5 seconds?
Hey @noahtalerman , I was able to rerun the tests in the following env: Fleet v4.66.0 Host count = 6.5K total (5,201 mac, 650 Win, 650 Linux) Profile count = MDM 15, DDM 3, Windows 3 *note - ddm and windows profiles have always uploaded/deleted quickly, the main objective was to test mdm config profiles
Results are def better than with 20K hosts. The average time to upload a .mobileconfig file is ~5sec and a little bit less to delete them.
Here's a short video showing the UI workflow
I updated the count to 6K macOS hosts just to make sure it matched your original request and I'm still seeing the same results @ ~5-6secs
I kept increasing the host count by 1500K and it appears that every time we jump by ~1K hosts it adds a second to the load time:
7-8K hosts = ~7-8sec 9-10K hosts = ~10sec
I updated the count to 6K macOS hosts just to make sure it matched your original request and I'm still seeing the same results @ ~5-6secs
I kept increasing the host count by 1500K and it appears that every time we jump by ~1K hosts it adds a second to the load time:
7-8K hosts = ~7-8sec 9-10K hosts = ~10sec
Thanks @PezHub! FYI @pintomi1989 bringing this to product office hours to discuss.
@noahtalerman approved by customer-preston to close this out
Apple profiles load slow, Fleet finds path through data flow, Swift as river's current go.