jupyter_server icon indicating copy to clipboard operation
jupyter_server copied to clipboard

Kernelspec Caching

Open kevin-bates opened this issue 2 years ago • 3 comments

Kernelspec Caching

This pull request introduces a new configurable feature called KernelSpec Caching. The KernelSpecCache instance supports the same retrieval methods as a KernelSpecManager and contains the configured KernelSpecManager instance. If caching is not enabled (by default), the cache is a direct pass-through to the KernelSpecManager, otherwise it acts as a read through cache, deferring to the KernelSpecManager on any cache misses. This functionality has proven useful in Enterprise Gateway where it has existed for a few years. In that implementation, the watchdog package is used to determine cache updates.

By introducing kernelspec caching, we can now define events corresponding to the addition, update, and deletion of kernel specifications and get closer to removing the 10-second polling performed by Lab once it has been updated to consume kernelspec events.

Monitors

Besides its enablement via the cache_enabled configurable, KernelSpecCache supports pluggable monitors that are responsible for detecting changes to the cached items and keeping the cache updated due to out-of-band updates. A kernelspec cache monitor is registered via the entry points group "jupyter_server.kernelspec_monitors" to introduce a layer of decoupling. This pull request includes two monitors:

  • KernelSpecWatchdogMonitor under the entry point name "watchdog-monitor": This monitor uses the watchdog package to monitor changes to directories containing kernelspec definitions. Because this monitor uses the watchdog package, an optional dependency has been added for users wishing to use this monitor: pip install jupyter_server[watchdog-monitor]
  • KernelSpecPollingMonitor under the entry point name "polling-monitor": This monitor periodically polls (via a configurable interval trait) for kernelspec changes and computes an MD5 hash on each entry to further determine changes. It only updates the cache when the hash values have changed (or are new) and when it determines a kernelspec has been removed. The interval's default value is 30 (sec).

KernelSpecPollingMonitor is the default monitor used since it does not introduce new packages.

Other monitors that would be useful are:

  • an event consumer monitor where, once we add event production to the cache, then an event consumer monitor could be used to receive kernelspec update events from remote Kernel Servers (e.g., Gateway or jupyverse) rather than having to rely on polling.
  • a monitor similar to the "watchdog-monitor", but using watchfiles instead, since we have other needs for watchfiles. At that point, we may be able to make "watchfiles-monitor" the default - assuming we include the watchfiles package.

KernelSpec caching is disabled by default. If we want to enable it by default, we'll need to adjust some tests (probably only kernelspecs and perhaps kernels api tests) to configure a much shorter polling interval, or switch the default to the "watchfiles-monitor" (once implemented) since it sounds like there's a preference to watchfiles over watchdog. (Note: Enterprise Gateway happened to use watchdog a few years ago, thus the reason the watchdog monitor exists. If we built a "watchfiles-monitor", I have no affinity for "watchdog-monitor" and see no reason to keep it unless we find advantages over watchfiles.)

Class Hierarchy

KernelSpecCache contains instances of the KernelSpecManager and KernelSpecMonitorBase that corresponds to the monitor_name configurable.

---
title: KernelSpecCache Class Hierarchy
---
classDiagram
    KernelSpecCache --* KernelSpecManager
    KernelSpecCache --* KernelSpecMonitorBase
    KernelSpecMonitorBase <|-- KernelSpecWatchdogMonitor
    KernelSpecMonitorBase <|-- KernelSpecPollingMonitor
    KernelSpecMonitorBase <|-- BYOMonitor
    class KernelSpecCache{
      +str monitor_name
      +bool cache_enabled
      +get_kernel_spec(name)
      +get_all_specs()
      +get_item(name)
      +get_all_items()
      +put_item()
      +put_all_items()
      +remove_item(name)
      +remove_all_items()
    }
    class KernelSpecManager{
      +get_kernel_spec(name)
      +get_all_specs()
    }
    class KernelSpecMonitorBase["KernelSpecMonitorBase (ABC)"]{
      +initialize()*
      +destroy()*
    }
    class KernelSpecPollingMonitor{
      +float interval
    }

Event Support (Future)

With caching in place, we should be able to fire add and update kernelspec events from KernelSpecCache.put_item() and delete kernelspec events from KernelSpecCache.remove_item() since their corresponding all items methods simply call on the singleton versions.

kevin-bates avatar May 10 '23 21:05 kevin-bates

Hi @blink1073. The kernelspec cache test failures are due to a missing watchdog dependency. I figured placing it into the "test" optional dependencies would address this, but don't really see that optional dependency being used other than for the examples tests. Is there an additional location I should be updating for this to be included in the Python tests? Is this kind of thing typically performed in jupyterlab/maintainer-tools/.github/actions/base-setup@v1?

EDIT: Hmm, I'm also seeing a different issue not seen in dev (sorry about that), so will need to address that as well, but I think my questions are still applicable (although it looks like Windows must include watchdog anyway - or some resolution has occurred there).

kevin-bates avatar May 10 '23 21:05 kevin-bates

TypeError: 'type' object is not subscriptable. We can't use dict[str, str] until Python 3.9 iirc, you'll need to use Dict from typing. When we run hatch run test or hatch run cov it should pick up the test dependencies.

blink1073 avatar May 10 '23 21:05 blink1073

Thanks Steve. Yeah, that's the issue I saw as well and things are moving along better in my fork. I had originally thought it was the watchdog dependency but, after closer inspection, it was all about the indexing issue. Looks like the docs build is failing due to the new sub folder for the monitors that I'll follow up on. Thanks for your help.

~To help my understanding, does jupyterlab/maintainer-tools/.github/actions/base-setup@v1 essentially issue an install with the optional [test] dependency?~ Crap - I just need to FULLY READ your previous response. :smile:

kevin-bates avatar May 10 '23 21:05 kevin-bates