Fetch and display course information more frequently
Reserves staff were accustomed to immediate updates in SearchWorks display when items were placed on Reserve in Symphony.
With FOLIO, it's my understanding from Cory that the indexer knows pretty promptly that a title/item has been associated with a course but we don't update the display (https://searchworks.stanford.edu/reserves or in the record view) until SearchWorks is deployed. In the first two weeks of the quarter, when staff are creating many FOLIO Courses and communicating with Instructors about Course status, this is too slow.
In an ideal scenario, we'd be updating the SearchWorks display as soon as the title/item is associated with a current course in FOLIO. Cory has indicated that immediate updates aren't possible right now given the amount of info that needs to be updated.
Since immediate isn't possible, updating the SearchWorks display every hour would be the next best thing. If hourly isn't possible, then anything up to daily is workable. But at the very slowest, we should update the display once per day.
Looks like it takes about 10 to 12 seconds to fetch all the course data live from Folio. Maybe we could fetch and cache once an hour? We'd need to change our approach since currently we're storing a copy of the courses JSON data in a file in the repository that gets deployed with the app.
3.2.0 :004 > puts Benchmark.measure { client.courses }
0.159019 0.009213 0.168232 ( 11.063220)
Whatever we do, we should take care that the data is the same across all the load-balanced servers. I'm not sure if that's simply keeping the cache lifetimes in sync, adding the data to the database, or something else?
It would be really good if we could get this work in by June 17, 2024 when reserves staff start to remove all the Spring Qtr reserve records. At the very latest, we should have it fixed before September 1, 2024 when staff start to create reserve records ahead of the Fall Qtr. (Law school instruction begins Sept. 3rd.)
Comparing with the course reserves app for reference (this doesn't necessitate we follow the same approach):
- Course reserves info is retrieved from the MaIS APIs.
- The fetch_courses rake take retrieves the info from the APIs and writes them out to files in a course content directory under lib. The lib/course_work_content directory is also listed as a linked directory in deploy.rb. The app does not need to be restarted (as far as I can tell) for this content to be available to the app in its updated form.
- schedule.rb schedules the fetch_courses rake task to run at 3:30 AM every day. (The task runs once a day and takes somewhere about an hour to run)
Schedule.rb could be used to kick off a rake task that updates the course information available to the app. Approaches could be one of the following:
- Using the database to create a table to store course information. SearchWorks then reads from the database to get the latest updated course information.
- Using a search index (another Solr collection?) to store, index, and search course reserves information.
- Using Rails cache to store the cache courses.json. [Schedule.rb or a similar mechanism could be used to indicate when updates happen so they are triggered at the same time time across servers. ]
- Having the course fetching rake task also store the information in a lib/course_content/courses.json file (or config? unless the app has to be restarted for that) using an approach similar to reserves above (i.e. lib/course_content is designated a shared directory in deploy.rb, the courses.json is written out to this directory, and the app uses this file)
In all of these approaches, it doesn't make sense to continue to keep a courses.json in version control. In all of them, we would have to break out course updates into its own rake task/not maintain a courses.json in GitHub.
P.S. Another option is Redis. That may need to be setup for SearchWorks. P.P.S. If writing to a central location (like a database) which is not dependent on a particular webapp vm, the writing to the courses table in the database should take place only once (and not be triggered on each vm). Schedule.rb may be able to specify the server being used so we don't have to update the courses table multiple times for one rake task. Using a different approach of writing to a file on each VM, we may be able to stagger the time between when the content is read out of Folio and when the file the app is reading from is updated (e.g. read out to a temp file on each VM at a particular time, and then five minutes later update the courses.json file on each VM)