jupyter-cache Review notebook cacheing and execution packages

A place to discover and list other tools that do some form of notebook cacheing / execution / storage abstractions

Scrapbook (metadata tagging for python objects and cell outputs)
Bookstore (storage layer on S3 for notebooks)
Zarr (chunked storage interface https://zarr.readthedocs.io/en/stable/)

Feb 17 '20 07:02 choldgraf

tinydb is a well-used, lightweight package with a simple JSON database API. Different storage classes can be used, which can also be wrapped in Middleware to customise their behaviour:

>>> from tinydb.storages import JSONStorage
>>> from tinydb.middlewares import CachingMiddleware
>>> db = TinyDB('/path/to/db.json', storage=CachingMiddleware(JSONStorage))

Feb 17 '20 08:02 chrisjsewell

scrapbook contains (in-memory only) classes to represent a collection of notebooks Scrapbook, and a single notebook Notebook.

Of note, is that these have methods for returning notebook/cell execution metrics (like time taken), which they presumably store during notebook execution.

They also provide methods to access 'scraps' which are outputs stored with name identifiers (see ExecutableBookProject/myst_parser#46)

Feb 17 '20 08:02 chrisjsewell

This is the link to the cacheing currently implemented by @mmcky and @AakashGfude: https://github.com/QuantEcon/sphinxcontrib-jupyter/blob/b5d9b2e77fdc571c4c718e67847020625d096d6d/sphinxcontrib/jupyter/builders/jupyter_code.py#L119

Feb 19 '20 11:02 chrisjsewell

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

Feb 19 '20 11:02 chrisjsewell

rossant/ipycache (last commit 2016), SmartDataInnovationLab/ipython-cache (last commit 2018) are both examples of cell level magics that pickle the outputs of cells for later use.
mkery/Verdant (last commit Oct 24, 2019) is a JupyterLab extension that automatically records the 'history' of Jupyter notebook cells, and stores them in a .ipyhistory JSON file. Note, the code is all written in TypeScript.

Feb 19 '20 12:02 chrisjsewell

Another thought I had, is to look at git itself and e.g. GitPython. I could conceive of something like the cache being its own small repository and when you add a new notebook or update one, you 'stage' it, then on execution you get all the 'staged' notebooks, run them, then commit back the final notebooks.

I think this is the kinda thing that some more bespoke notebook UIs do. E.g., I believe that Gigantum.IO (a proprietary cloud interface for notebooks) commits notebooks to a git repository on-the-fly, and then gives you the option to go back in history if needed. I don't believe they do any execution cacheing, just content cacheing

Feb 19 '20 15:02 choldgraf

Thank you for creating this helpful resource!

As I am on the search myself, here is another pointer (which I still need explore):

dask.cache and cachey

May 04 '20 08:05 eldad-a

jupyter-cache jupyter-cache copied to clipboard

Review notebook cacheing and execution packages

jupyter-cache
jupyter-cache copied to clipboard