ipykernel icon indicating copy to clipboard operation
ipykernel copied to clipboard

Kernel subshells (JEP91) implementation

Open ianthomas23 opened this issue 1 year ago • 0 comments
trafficstars

This is the implementation of the kernel subshells JEP (jupyter/enhancement-proposals#91). It follows the latest commit (1f1ad3d) with the addition of a %subshell magic command that is useful for debugging. To try this out I have a JupyterLab branch that talks to this branch and is most easily tried out using https://mybinder.org/v2/gh/ianthomas23/jupyterlab/jep91_demo?urlpath=lab; once the mybinder instance has started, open the subshell_demo_notebook.ipynb and follow the instructions therein.

The idea is that this is mergeable as it is now, it is backward compatible in that it does not break any existing use of ipykernel (subject to CI confirmation). There are some ramifications of the protocol additions (outlined below) that will need addressing eventually, but I consider these future work that can be in separate PRs.

Outline of changes

  1. The parent subshell (i.e. the main shell) runs in the main thread.
  2. Each new subshell runs in a separate thread.
  3. There is a new thread that deals with all communication on the shell channel, previously this was performed in the main thread.
  4. Communication between the shell channel thread and other threads is performed using ZMQ inproc pair sockets, which are essentially shared memory and avoid the use of thread synchronisation primitives.
  5. Incoming shell messages are handled by the shell channel thread which extracts the subshell_id from the message and passes it on to the correct subshell.
  6. Subshells are created and deleted via messages sent on the control channel. These are passed to the shell channel thread via inproc pair sockets so that the SubshellManager in the shell channel thread is responsible for subshell lifetimes.

Example scenario

Here is an example of the communication between threads when running a long task in the parent subshell (main thread) and whilst this is running a child subshell is created, used, and deleted.

sequenceDiagram
    participant client as Client
    participant control as Control thread
    participant shell as Shell channel thread
    participant main as Main thread

    client->>+shell: Execute request (main shell)
    shell->>-main: Execute request (inproc)
    activate main

    client->>+control: Create subshell request
    control->>-shell: Create subshell request (inproc)
    activate shell
    create participant subshell as Subshell thread
    shell-->>subshell: Create subshell thread

    shell->>control: Create subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Create subshell reply

    client->>+shell: Execute request (subshell)
    shell->>-subshell: Execute request (inproc)
    activate subshell

    subshell->>shell: Execute reply (inproc)
    deactivate subshell
    activate shell
    shell->>-client: Execute reply (subshell)

    client->>+control: Delete subshell request
    control->>-shell: Delete subshell request (inproc)
    activate shell
    destroy subshell
    shell-->>subshell: Delete subshell thread

    shell->>control: Delete subshell reply (inproc)
    deactivate shell
    activate control
    control->>-client: Delete subshell reply

    main->>shell: Execute reply (inproc)
    deactivate main
    activate shell
    shell->>-client: Execute reply (main shell)

Future work

ipykernel

  1. Shell channel thread deserialises ~the whole~ some of the message to get the subshell_id. Ideally it would only deserialise the header. May need changes in Jupyter Client.
  2. Signalling a subshell to stop uses a threading.Event following the existing anyio implementation which requires an extra thread per Event. It would be nice if this could be changed so a subshell is a single thread not two.
  3. Execution count. Should either be a separate count per subshell or a single count for a kernel. Needs a decision and changes in IPython as is currently not atomic.
  4. History. Related to item 2 above.
  5. input() on more than one subshell at the same time run but do not store correctly.
  6. Debugger use needs investigating.
  7. Busy/idle status needs investigating. Should there, as now, be separate status for each subshell, or the concept of kernel (i.e. any subshell) busy status? This issue is much wider than subshells as it includes status of the control channel, and how Jupyter Server should track status (jupyter-server/jupyter_server#1429).
  8. Use of display hooks for e.g. Matplotlib. Should these be on the parent subshell, or child subshells too?

JupyterLab

The JupyterLab branch I am using to demo this isn't really intended to be merged. But if it was, it needs:

  1. Check kernel_info to see if subshells are supported.
  2. Delete subshell when close a subshell's ConsolePanel.
  3. Report subshell IDs in tree view?
  4. Display of subshell busy/idle status.

(Edited for clarity)

ianthomas23 avatar Jun 13 '24 10:06 ianthomas23