Rock
Rock copied to clipboard
Kiosk Device ItemCache Rebuild Causing Check-In Delays
Prerequisites
- [X] Put an X between the brackets on this line if you have done all of the following:
- Did you perform a search at https://github.com/issues?q=is%3Aissue+user%3ASparkDevNetwork+-repo%3ARock to see if your bug or enhancement is already reported?
- Can you reproduce the problem on a fresh install or the demo site?
- Did you include your Rock version number and client culture setting?
Posting on behalf of North Point Ministries, Inc by BEMA Software Services.
A Picture Is worth a Thousand Words
data:image/s3,"s3://crabby-images/f19dc/f19dc36bcf2a2b00b575e2829dbb14a9fea1682a" alt="image"
User experiencing 30 sec 'freeze' as Rock rebuilds KioskDevice cache
Description
Check-In comes to a halt for about 30 seconds when a room is opened/closed. After installing New Relic tracing, we are able to see Kiosk Device class rebuilding list of available group locations at about 1800 SQL transactions. Given the number of campuses/kiosks and large number of groups and group locations, it could be that this performance is not scaling well and the issue is not evident without a large group location dataset.
Current Config: less than 45 kiosk devices in use. Multiple iPad's using the same kiosk device config. Approximately 900 Active Group Locations and Group Location Schedules for children ministry area, but more are active on a sunday as well. Infrastructure: Web Server VM: D16s v3 (16 vcpus, 64 GB mem); Azure SQL Server P11
Slowness appears most evident in how many requests are being made to Azure SQL Server from the Azure VM Web Server. Spikes in CPU/Database are inconclusive.
Our best guess is method KioskDevice.LoadKioskLocations causes a lot of processing and data-pulling: https://github.com/SparkDevNetwork/Rock/blob/40082b8e64706c988183463a427844f5160f0b4d/Rock/CheckIn/KioskDevice.cs#L358-L429 . The 'smoking gun' was a transaction timeline that shows a staff user closing a room, causing the next kiosk phone number search to 'freeze' Rock for about 30 seconds.
It would be great if this method and other KioskDevice methods were evaluated for performance with a large number of unique group locations/schedules. Any time the KioskDevice cache is cleared, which currently happens if anyone closes or opens are room (usually during a busy check-in time), it can cause issues rebuilding.
Any help here would be greatly appreciated!
Steps to Reproduce
- Create rock instance with large number of unique groups and unique group locations, similar schedules.
- Add to multiple campuses
- Create Kiosk devices to use one or two root locations.
- Open /checkin and monitor the cache load on initial page search (building cache for that Kiosk Device).
- Close/Open a room and re-try search and monitor load .
Expected behavior:
Some slowness (less than 5 seconds) expected on initial kiosk load, but not 30 seconds of unresponsiveness.
Actual behavior:
Upon a rock user opening or closing a room, or turning on the kiosks each Sunday morning, the Rock server slows down considerably. During check-in rush, it becomes unusable for 30 seconds.
Versions
- Rock Version: [12.8]
- Client Culture Setting: [EN-US]
@sam-crisp I ran some tests on my laptop using SQL Server 2017. I setup 52 devices and 35,964 GroupLocationSchedules across 4 campuses. The average time for the initial load of the Welcome screen after selecting the configuration was 9.8 sec. The average time to load the welcome screen after closing/opening a location was 10.1 sec. I have made some recommendations for performance improvements based on increased and better use of the cache. This has not yet been prioritized.
In the meantime if you are able to fine tune the kiosks to serve certain areas or locations that would give them less data to load. For example, below is the output of one of my tests after closing a room. If I limit the Locations to just the Main Campus then the load time would be cut in half. I realize this might not be an option, or something you are already doing. But if it's something you can do it will help, possibly a lot.
- Cache clear for Rock.CheckIn.KioskDevice took 3ms
- LoadKioskLocations for device All Campuses Checkin Kiosk and Location Main Campus took 4602ms
- LoadKioskLocations for device All Campuses Checkin Kiosk and Location North Campus took 3048ms
- LoadKioskLocations for device All Campuses Checkin Kiosk and Location West Campus took 11ms
- LoadKioskLocations for device All Campuses Checkin Kiosk and Location East Campus took 642ms
- Create KioskDevice for ID 12 took 8322ms
- Locations Command Close took a total of 10197ms
@ethan-sparkdevnetwork Any update on this issue and potential fixes? This is a continuing performance issue at NPM when rooms must be closed during check-in. They have already limited locations at the kiosks.
@bobrufenacht / @sam-crisp We are still aware of this issue (and all currently open issues) but there has been a rather long list of higher priority items in our queue which continues to trump this one. Although I don't think we're going to re-engineer this part of Rock until a 'Check-in 2.0', we believe there are possibly a few optimizations we can apply to mitigate the 'cost' or impact of closing a location. That said, Ethan and I did discuss this again last week and we're going to shoot for v16.1 (or .2) for any quick optimizations.
(We're also working on a new GitHub label to help everyone know when an issue has been put into a developers work queue. That would be the best way to get visibility on which bugs are 'next up'.)
[Update: also: 1dcf664]