deno icon indicating copy to clipboard operation
deno copied to clipboard

First class Jupyter notebook integration

Open bartlomieju opened this issue 2 years ago • 10 comments

Jupyter notebooks are very popular. They provide rich, interactive environment for development.

There are numerous kernels that support JavaScript and TypeScript:

  • https://github.com/n-riesco/ijavascript
  • https://github.com/yunabe/tslab
  • https://github.com/winnekes/itypescript

Prompted by discussion with @apowers313 who's working on a kernel for Deno (https://github.com/apowers313/ideno) I propose we add first-class support for Jupyter in Deno (with deno jupyter subcommand).

I'd argue that providing first class support for Jupyter will open Deno to a whole community of people using Jupyter notebooks, additionally providing the community with new powerful tools (after all Deno supports WebGPU out-of-the-box) and could help significantly Machine Learning applications of Deno.

This proposal is motivated by several things; firstly Deno originated from a similar idea to Jupyter called PropelML that never fully materialized. Secondly, @apowers313 will have to integrate with V8 inspector protocol to provide kernel functionality. Currently Deno doesn't have a programatic API to interact with the inspector so it will require quite an effort to integrate over Websocket. Additionally most of the functionality that has to be provided for kernel is already working in the REPL. In fact most of REPL functionality could be reused in the kernel, we would have to add communication protocol APIs to integrate with the kernel.

@kitsonk was eyeballing implementation of the kernel in Q2/Q3 but that never materialized due to other more pressing work. I'll be happy to spearhead the effort as it seems like a very fun project to work on.

bartlomieju avatar Dec 07 '21 13:12 bartlomieju

Roadmap to creating a kernel:

  • [x] create a kernel spec
    • see jupyter kernelspec list for examples and jupyter kernelspec install to install
    • run jupyter notebook -- if installed correctly, it will show up under "New" in the top right corner of the Jupyter web browser
    • your kernel will be started as a command line application with the arguments specified in the kernel spec. it won't start until you select "new" in the menu
  • [x] create zeromq connections
    • the kernel spec will specify a {connection_file} that gets converted to a JSON connection file describing IP / ports to connect to
    • the first messages received will be "kernel_info_request" and "comm_info_request" on the shell zmq Dealer connection (examples of the packets can be found here)
    • you can set Jupyter into debug mode so that jupyter notebook prints the packets it sends / receives:
      • create a config file using jupyter notebook --generate-config
      • set the following options in the config:
        • c.Application.log_level = 'DEBUG'
        • c.JupyterApp.log_level = 'DEBUG'
        • c.NotebookApp.log_level = 'DEBUG'
        • c.Session.debug = True
  • [x] next create the IOPub zmq Publisher to send "busy" and "idle" packets
  • [x] next handle the "execute_request" message, which sends code from the front-end to the kernel
    • when a user selects "restart and run all" in the front end, I think it sends all the Jupyter cells at once, you will have to queue execution requests and run them one at a time
    • each execution request sends multiple replies, and each reply is expected to embed the original packet header in the reply as a parentHeader, so you'll have to keep state for which task is currently running
    • you will need to capture stdout and stderr from the kernel and send them back to the front end as stream messages
  • [ ] implement "kernel_shutdown" and "kernel_restart" on the Control zmq Dealer connection
    • note that the "kernel_shutdown" request has a "restart" option, which presumes that you are starting up a clean JS environment. I have no idea how this is going to work if the kernel and the execution context are running in the same Deno instance.
    • I also have no idea how you are going to interrupt Deno mid-execution.
  • [x] implement display data to send PNG, SVG, HTML, JSON back to the browser to be rendered in the front end
    • I think these should automatically render for objects that have Symbols on them, similar to toStringTag. e.g. Symbol.toPngTag. Maybe TC39 worthy?
  • [ ] implement an interpreter to parse out line magics and cell magics
    • pay special attention to:
      • automagic, which turns off requiring "%" at the front of magics. instead of requiring a magic like "%ls" the user can just type "ls"
      • !cmd command execution
      • magic assignment like "output = %ls"
      • {var} substitution
      • inline documentation and inspection like "?" and "??". I dream of a Symbol.toDocTag on Objects that contains documentation (maybe populated by JSDoc comments) or URLs to documentation, similar to Python's docstrings. Might be a TC39 proposal?
      • input and output caching in In[n] and Out[n] (also _, __, and ___)
    • feel free to steal magics or the interpreter from magicpatch.
    • there should probably be an API to enable users to add their own magics.
  • [ ] implement introspection and completion
  • [ ] maybe implement code completeness which is only used by command line Jupyter front ends to determine when to execute code

Sorry, I realize that's a lot... hopefully it's helpful.

apowers313 avatar Dec 08 '21 08:12 apowers313

A bit of a hurdle in the integration is the fact that the only crate that provides async integration with ZeroMQ is currently marked as unstable and not recommended to use in production: https://github.com/zeromq/zmq.rs

This crate builds on top of: https://github.com/erickt/rust-zmq which provides sync bindings (which might not be a big deal), but it seems its built process might be quite involved.

I will do some more research on this topic before proceeding.

@apowers313 thank you for providing the roadmap, this is very helpful!

bartlomieju avatar Dec 08 '21 09:12 bartlomieju

If you get painted into a corner, Jupyter appears to require a very small subset of ZMQ: it appears to only use NULL security, send ~4 packets to negotiate the session, and then has a control / length header for each data chunk. There's a Wireshark plugin for ZMQ if you want to see how it works. (Note: I had to use an older commit to get the plugin to work)

apowers313 avatar Dec 08 '21 09:12 apowers313

@apowers313 perfect!

bartlomieju avatar Dec 08 '21 09:12 bartlomieju

I'd like some feedback on where / how to implement the user-facing Jupyter API for the Deno Jupyter kernel. This would be the API for users to display charts / images, displaying object specific documentation, add new magics, etc.

I think regardless we will want a Deno.core.jupyter interface, which will only be instantiated when Deno is running in Jupyter (useful for feature detection), and that interface will have Deno.core.jupyter.display(mimeType, data) for rendering and saving formatted data in Jupyter. Similarly, it would have Deno.core.jupyter.addmagic(name, fn) for user-implemented functions. This enables users to import modules that will detect Jupyter and implement new functionality (similar to how %matplotlib works in Python's Jupyter today.

Requested Feedback 1: I'd be interested if anyone objects to Deno.core.jupyter as a direction.

The part where design decisions are needed is an interface / protocol for Objects to automatically convert them to structured data types. For example, if a user returns an object implementing Foo.toPng that function should be called and the returned data should be rendered as a PNG.

Requested Feedback 2: Three options for how to do this:

  1. Foo.toPng() -- Seems antiquated and potentially has namespace conflicts since it isn't Symbol based
  2. Foo[Symbol(Deno.toPng)] -- Deno-wide specific decoding of Objects, similar to Deno.customInspect. This allows the entire Deno ecosystem to benefit from this feature, not just Jupyter and eventually enables whatever comes after Jupyter or other new innovations.
  3. Foo[Symbol(Deno.core.jupyter.toPng)] -- Jupyter only symbols, not nearly as useful but keeps them out of the rest of Deno if people don't think this functionality is going to be broadly useful.
  4. Foo[Symbol(Symbol.toPng)] -- Requires modifying Symbol, similar to toStringTag, but potentially benefits all of JS. Might require a TC39 proposal to ensure that Deno doesn't drift from ECMAScript specs.

Thanks!

apowers313 avatar Dec 29 '21 19:12 apowers313

I just checked in a proposed API for Jupyter display:

  • display(mimeType, uint8Buf, opts)
  • displayPngFile(path, opts)
  • displayPng(buf, opts)
  • displayFile(path) -- guesses file type based on file extension

I'm trying to decide if it would be more convenient to overload displayPng with all the different types it could support (buf, file path, stream, whatever tomorrow's thing is...) or if it's better to have different function calls for each input type. Any thoughts would be appreciated.

apowers313 avatar Dec 31 '21 18:12 apowers313

I think regardless we will want a Deno.core.jupyter interface, which will only be instantiated when Deno is running in Jupyter (useful for feature detection), and that interface will have Deno.core.jupyter.display(mimeType, data) for rendering and saving formatted data in Jupyter. Similarly, it would have Deno.core.jupyter.addmagic(name, fn) for user-implemented functions. This enables users to import modules that will detect Jupyter and implement new functionality (similar to how %matplotlib works in Python's Jupyter today.

Requested Feedback 1: I'd be interested if anyone objects to Deno.core.jupyter as a direction.

The part where design decisions are needed is an interface / protocol for Objects to automatically convert them to structured data types. For example, if a user returns an object implementing Foo.toPng that function should be called and the returned data should be rendered as a PNG.

Sounds good to me, but it should be Deno.jupyter namespace instead of Deno.core.jupyter.

Foo.toPng() -- Seems antiquated and potentially has namespace conflicts since it isn't Symbol based Foo[Symbol(Deno.toPng)] -- Deno-wide specific decoding of Objects, similar to Deno.customInspect. This allows the entire Deno ecosystem to benefit from this feature, not just Jupyter and eventually enables whatever comes after Jupyter or other new innovations. Foo[Symbol(Deno.core.jupyter.toPng)] -- Jupyter only symbols, not nearly as useful but keeps them out of the rest of Deno if people don't think this functionality is going to be broadly useful. Foo[Symbol(Symbol.toPng)] -- Requires modifying Symbol, similar to toStringTag, but potentially benefits all of JS. Might require a TC39 proposal to ensure that Deno doesn't drift from ECMAScript specs.

In this case I think we should use something like Symbol.for("Deno.jupyter") similar to Symbol.for("Deno.customInspect").

I just checked in a proposed API for Jupyter display:

  • display(mimeType, uint8Buf, opts)
  • displayPngFile(path, opts)
  • displayPng(buf, opts)
  • displayFile(path) -- guesses file type based on file extension

I'm trying to decide if it would be more convenient to overload displayPng with all the different types it could support (buf, file path, stream, whatever tomorrow's thing is...) or if it's better to have different function calls for each input type. Any thoughts would be appreciated.

I believe the "overload" approach would be better in this case - we already use this approach in numerous Deno APIs.

Deno.jupyter.display(mimeType: string, buf: Uint8Array, opts);
Deno.jupyter.displayPng(pathOrBuf: string | Uint8Array, opts);
Deno.jupyter.displayFile(path: string);

Seem preferable, what are the opts that could be used for displaying files?

bartlomieju avatar Jan 02 '22 01:01 bartlomieju

Is this, and Ideno still being worked on?

tif-calin avatar Jan 02 '23 03:01 tif-calin

Nope, I stopped working on IDeno in favor of the built-in Jupyter kernel. The built-in kernel stalled out because the ZMQ library we were using had some bugs.

IDeno was mostly functional, happy to pass the baton if anyone wants to pick it up.

apowers313 avatar Jan 02 '23 04:01 apowers313

Hey @tif-calin, @apowers313! We did the kernel mostly working but that ZMQ library bug was quite serious and it was happening very often (it manifested itself every 3-4 connections). If there was a different library that we could use, then we should be able to revive that PR without much trouble and that still seems like a great feature for many people.

bartlomieju avatar Feb 05 '23 00:02 bartlomieju

I would be very interested in seeing this happening. Any way I can help?

acrodrig avatar May 27 '23 20:05 acrodrig

I would be very interested in seeing this happening. Any way I can help?

Fix the Rust ZMQ library? :)

apowers313 avatar May 28 '23 02:05 apowers313

Do you have a specific bug that needs to be fixed? Is it filed somewhere?

acrodrig avatar May 28 '23 02:05 acrodrig

Do you have a specific bug that needs to be fixed? Is it filed somewhere?

https://github.com/zeromq/zmq.rs/issues/153

apowers313 avatar Jun 21 '23 14:06 apowers313

Also, for a mostly working non-Rust version of a kernel: https://github.com/apowers313/ideno

apowers313 avatar Jun 21 '23 14:06 apowers313

Love this. Where has the work coalesced?

Background: I'm a longtime Jupyter, IPython, and ZeroMQ maintainer. I'd love to help steward this work.

rgbkrk avatar Aug 30 '23 00:08 rgbkrk

@rgbkrk still stuck on this bug as far as I can tell: https://github.com/zeromq/zmq.rs/issues/153

apowers313 avatar Aug 30 '23 00:08 apowers313

Hey @rgbkrk, thanks for stopping by. So @apowers313 and I had a PR that was quite close to landing (https://github.com/denoland/deno/pull/13122) unfortunately the bug above caused it to be very flaky (2/3 times you opened a notebook it resulted in Broken pipe error). Besides that, the PR was more or less ready to land.

We recently discussed this feature with @crowlKats and @dsherret and we'd like to resurrect the PR, we were thinking of maybe rewriting parts of zmq.rs that are necessary for Jupyter kernel purely in Rust and with Tokio integration in mind. If you have other ideas I'd be more than happy to hear them!

bartlomieju avatar Aug 30 '23 10:08 bartlomieju

Looks like I'm going to have to learn Rust. While it might not be the best, you might get more reliability more quickly by building on top of libzmq even though it would pale in comparison to native rust bindings. As far as I can tell, there's a lot to be tested within zmq.rs.

I'm curious if this jupyter rust kernel ran into the same issues. Have you all checked that out too?

rgbkrk avatar Aug 30 '23 16:08 rgbkrk

I'm curious if this jupyter rust kernel ran into the same issues. Have you all checked that out too?

I did not know about that project. I can certainly check it.

Let me get my PR rebased and reopened so we can discuss over code.

bartlomieju avatar Aug 31 '23 12:08 bartlomieju

Opened #20337 that is rebased against main.

bartlomieju avatar Aug 31 '23 16:08 bartlomieju

FYI, it looks like the PR above works quite nicely with notebook integration in VSCode, but all hell breaks loose when I try it with jupyter notebook. I think the PR is quite close to being landable, it probably needs 5-10h of work to polish it and release.

bartlomieju avatar Aug 31 '23 17:08 bartlomieju