jupyterlite-sphinx
jupyterlite-sphinx copied to clipboard
Size reduction aproaches
Currently, we generate notebooks for the TryExamples directive, and the JupyterLite, NotebookLite, and Voici directives can take a path to a notebook.
While we can disable the JupyterLite source maps to reduce build size (a technique used downstream in https://github.com/scikit-learn/scikit-learn/pull/26246, https://github.com/numpy/numpy/pull/26745, and https://github.com/sympy/sympy/pull/27419), I've opened this issue as an open-ended item to see if jupyterlite-sphinx as a companion to JupyterLite can reduce its footprint by reducing the size of the notebooks that are eventually copied into the JupyterLite folder.
- Use https://github.com/arve0/ipynbcompress? It has not been maintained for the last two years; maybe we can fork it or take over maintenance via PEP 541?
- Use
optipng(if available) or projects with Python bindings to it, orpillowas an optional dependencies to reduce the size of images in the docs: https://sphinx-gallery.github.io/stable/gen_modules/sphinx_gallery.utils.optipng.html - Use
nbstripoutif enabled via a global config option to clear all outputs (except thejupyterlite_sphinx_striptag) and kernel metadata from all notebooks (maybe not for theTryExamplesnotebooks, but this would be useful for long-form notebooks – thescikit-learndocs via Sphinx-Gallery already to seem to do this: https://sphinx-gallery.github.io/stable/auto_examples/plot_9_multi_image_separate.html.- We don't currently do this for Markdown notebooks that were added in #221. Since they don't contain the outputs of the cells in their contents, they don't have any outputs upon conversion to IPyNB either.
- However, conventional IPyNB files can indeed contain outputs, which we can explore stripping.
Reductions in sizes will be helpful for:
- projects deploying documentation via GitHub Pages or on other static webpage hosts
- reducing bandwidth usage for readers of said documentation
Okay, so, as an experiment, I tried reducing the size of the notebooks generated by the TryExamples directive – thinking that those notebooks have more significant numbers in comparison to the ones connected to the NotebookLite/JupyterLite directive(s) since they can quickly go into the thousands, based on how many docstrings exist in the entire documentation source for a package – sadly the results are not helpful. :(
By removing the outputs from the UUID-based notebooks from numpy/numpy#26745, the size of 1496 total notebooks was reduced from 4.5 MiB to 2.5 MiB, and a similar test for SymPy with 2519 notebooks/example revealed a reduction from 6.8 MiB to 3.9 MiB. Hence, this sounds like a paltry improvement of just 1.44% and is not really worth incorporating, especially when NumPy's total docs size without JupyterLite's source maps is ~138 MiB. Even with enabling global docstring examples for Matplotlib, which also uses Sphinx-Gallery and has a lot of images in its notebook outputs, I didn't see much of a reduction (20 MiB – brought down to 557 MiB).
I'll leave this issue open in case there's something I am missing in this aim to reduce the build size from jupyterlite-sphinx's side that anyone else can point out. Otherwise, we can close and try to find optimisation options in JupyterLite itself.
It is also likely that the blob in git (and on the wires) are actually gzippe'd so the actual gains are lower.