Investigate further startup time improvements
Problem
pyodide_kernel requires many packages to be fully downloaded (if not already cached) and installed before giving the user the ability to do interactive computing.
Proposed Solution
Investigate additional packages which could be avoided, either by shim or patch.
Additional context
With jupyterlite/jupyterlite#913, we'd be down to 724 modules.
>>> import sys >>> sorted(sys.modules) ['IPython', 'IPython.core', 'IPython.core.alias', 'IPython.core.application', 'IPython.core.async_helpers', 'IPython.core.autocall', 'IPython.core.builtin_trap', 'IPython.core.compilerop', 'IPython.core.completer', 'IPython.core.completerlib', 'IPython.core.crashhandler', 'IPython.core.debugger', 'IPython.core.display', 'IPython.core.display_functions', 'IPython.core.display_trap', 'IPython.core.displayhook', 'IPython.core.displaypub', 'IPython.core.error', 'IPython.core.events', 'IPython.core.excolors', 'IPython.core.extensions', 'IPython.core.formatters', 'IPython.core.getipython', 'IPython.core.history', 'IPython.core.hooks', 'IPython.core.inputtransformer2', 'IPython.core.interactiveshell', 'IPython.core.latex_symbols', 'IPython.core.logger', 'IPython.core.macro', 'IPython.core.magic', 'IPython.core.magic_arguments', 'IPython.core.magics', 'IPython.core.magics.auto', 'IPython.core.magics.basic', 'IPython.core.magics.code', 'IPython.core.magics.config', 'IPython.core.magics.display', 'IPython.core.magics.execution', 'IPython.core.magics.extension', 'IPython.core.magics.history', 'IPython.core.magics.logging', 'IPython.core.magics.namespace', 'IPython.core.magics.osm', 'IPython.core.magics.packaging', 'IPython.core.magics.pylab', 'IPython.core.magics.script', 'IPython.core.oinspect', 'IPython.core.page', 'IPython.core.payload', 'IPython.core.prefilter', 'IPython.core.profiledir', 'IPython.core.pylabtools', 'IPython.core.release', 'IPython.core.shellapp', 'IPython.core.splitinput', 'IPython.core.ultratb', 'IPython.core.usage', 'IPython.display', 'IPython.extensions', 'IPython.extensions.storemagic', 'IPython.lib', 'IPython.lib.clipboard', 'IPython.lib.display', 'IPython.lib.pretty', 'IPython.paths', 'IPython.terminal', 'IPython.terminal.debugger', 'IPython.terminal.embed', 'IPython.terminal.interactiveshell', 'IPython.terminal.ipapp', 'IPython.terminal.magics', 'IPython.terminal.prompts', 'IPython.terminal.pt_inputhooks', 'IPython.terminal.ptutils', 'IPython.terminal.shortcuts', 'IPython.testing', 'IPython.testing.skipdoctest', 'IPython.utils', 'IPython.utils.PyColorize', 'IPython.utils._process_common', 'IPython.utils._process_posix', 'IPython.utils._sysinfo', 'IPython.utils.capture', 'IPython.utils.colorable', 'IPython.utils.coloransi', 'IPython.utils.contexts', 'IPython.utils.data', 'IPython.utils.decorators', 'IPython.utils.dir2', 'IPython.utils.docs', 'IPython.utils.encoding', 'IPython.utils.frame', 'IPython.utils.generics', 'IPython.utils.importstring', 'IPython.utils.io', 'IPython.utils.ipstruct', 'IPython.utils.module_paths', 'IPython.utils.openpy', 'IPython.utils.path', 'IPython.utils.process', 'IPython.utils.py3compat', 'IPython.utils.sentinel', 'IPython.utils.strdispatch', 'IPython.utils.sysinfo', 'IPython.utils.syspathcontext', 'IPython.utils.terminal', 'IPython.utils.text', 'IPython.utils.timing', 'IPython.utils.tokenutil', 'IPython.utils.wildcard', '__future__', '__main__', '_abc', '_ast', '_bisect', '_blake2', '_bz2', '_codecs', '_collections', '_collections_abc', '_compat_pickle', '_compression', '_contextvars', '_csv', '_datetime', '_decimal', '_frozen_importlib', '_frozen_importlib_external', '_functools', '_heapq', '_imp', '_io', '_json', '_locale', '_lsprof', '_md5', '_operator', '_pickle', '_posixsubprocess', '_pyodide', '_pyodide._base', '_pyodide._core_docs', '_pyodide._importhook', '_pyodide.docstring', '_pyodide_core', '_queue', '_random', '_sha1', '_sha256', '_sha3', '_sha512', '_signal', '_sitebuiltins', '_socket', '_sqlite3', '_sre', '_stat', '_string', '_strptime', '_struct', '_sysconfigdata__emscripten_wasm32-emscripten', '_thread', '_warnings', '_weakref', '_weakrefset', 'abc', 'argparse', 'array', 'ast', 'asttokens', 'asttokens.asttokens', 'asttokens.line_numbers', 'asttokens.util', 'asyncio', 'asyncio.base_events', 'asyncio.base_futures', 'asyncio.base_subprocess', 'asyncio.base_tasks', 'asyncio.constants', 'asyncio.coroutines', 'asyncio.events', 'asyncio.exceptions', 'asyncio.format_helpers', 'asyncio.futures', 'asyncio.locks', 'asyncio.log', 'asyncio.mixins', 'asyncio.protocols', 'asyncio.queues', 'asyncio.runners', 'asyncio.selector_events', 'asyncio.sslproto', 'asyncio.staggered', 'asyncio.streams', 'asyncio.subprocess', 'asyncio.tasks', 'asyncio.threads', 'asyncio.transports', 'asyncio.trsock', 'asyncio.unix_events', 'atexit', 'backcall', 'backcall.backcall', 'base64', 'bdb', 'binascii', 'bisect', 'builtins', 'bz2', 'cProfile', 'calendar', 'cmd', 'code', 'codecs', 'codeop', 'collections', 'collections.abc', 'colorsys', 'concurrent', 'concurrent.futures', 'concurrent.futures._base', 'concurrent.futures.thread', 'contextlib', 'contextvars', 'copy', 'copyreg', 'csv', 'dataclasses', 'datetime', 'decimal', 'decorator', 'difflib', 'dis', 'email', 'email._encoded_words', 'email._parseaddr', 'email._policybase', 'email.base64mime', 'email.charset', 'email.encoders', 'email.errors', 'email.feedparser', 'email.header', 'email.iterators', 'email.message', 'email.parser', 'email.quoprimime', 'email.utils', 'encodings', 'encodings.aliases', 'encodings.cp437', 'encodings.utf_8', 'enum', 'errno', 'executing', 'executing.executing', 'executing.version', 'fcntl', 'filecmp', 'fnmatch', 'fractions', 'functools', 'gc', 'genericpath', 'getopt', 'getpass', 'gettext', 'glob', 'gzip', 'hashlib', 'heapq', 'html', 'html.entities', 'http', 'http.client', 'importlib', 'importlib._abc', 'importlib._bootstrap', 'importlib._bootstrap_external', 'importlib.abc', 'importlib.machinery', 'importlib.metadata', 'importlib.metadata._adapters', 'importlib.metadata._collections', 'importlib.metadata._functools', 'importlib.metadata._itertools', 'importlib.metadata._meta', 'importlib.metadata._text', 'importlib.util', 'inspect', 'io', 'ipykernel', 'ipykernel.comm', 'ipykernel.jsonutil', 'itertools', 'jedi', 'jedi._compatibility', 'jedi.api', 'jedi.api.classes', 'jedi.api.completion', 'jedi.api.completion_cache', 'jedi.api.environment', 'jedi.api.errors', 'jedi.api.exceptions', 'jedi.api.file_name', 'jedi.api.helpers', 'jedi.api.interpreter', 'jedi.api.keywords', 'jedi.api.project', 'jedi.api.refactoring', 'jedi.api.refactoring.extract', 'jedi.api.strings', 'jedi.cache', 'jedi.common', 'jedi.debug', 'jedi.file_io', 'jedi.inference', 'jedi.inference.analysis', 'jedi.inference.arguments', 'jedi.inference.base_value', 'jedi.inference.cache', 'jedi.inference.compiled', 'jedi.inference.compiled.access', 'jedi.inference.compiled.getattr_static', 'jedi.inference.compiled.mixed', 'jedi.inference.compiled.subprocess', 'jedi.inference.compiled.subprocess.functions', 'jedi.inference.compiled.value', 'jedi.inference.context', 'jedi.inference.docstring_utils', 'jedi.inference.docstrings', 'jedi.inference.filters', 'jedi.inference.flow_analysis', 'jedi.inference.gradual', 'jedi.inference.gradual.annotation', 'jedi.inference.gradual.base', 'jedi.inference.gradual.conversion', 'jedi.inference.gradual.generics', 'jedi.inference.gradual.stub_value', 'jedi.inference.gradual.type_var', 'jedi.inference.gradual.typeshed', 'jedi.inference.gradual.typing', 'jedi.inference.gradual.utils', 'jedi.inference.helpers', 'jedi.inference.imports', 'jedi.inference.lazy_value', 'jedi.inference.names', 'jedi.inference.param', 'jedi.inference.parser_cache', 'jedi.inference.recursion', 'jedi.inference.references', 'jedi.inference.signature', 'jedi.inference.syntax_tree', 'jedi.inference.sys_path', 'jedi.inference.utils', 'jedi.inference.value', 'jedi.inference.value.decorator', 'jedi.inference.value.dynamic_arrays', 'jedi.inference.value.function', 'jedi.inference.value.instance', 'jedi.inference.value.iterable', 'jedi.inference.value.klass', 'jedi.inference.value.module', 'jedi.parser_utils', 'jedi.plugins', 'jedi.plugins.django', 'jedi.plugins.flask', 'jedi.plugins.pytest', 'jedi.plugins.registry', 'jedi.plugins.stdlib', 'jedi.settings', 'js', 'json', 'json.decoder', 'json.encoder', 'json.scanner', 'keyword', 'linecache', 'locale', 'logging', 'logging.config', 'logging.handlers', 'marshal', 'math', 'micropip', 'micropip._compat', 'micropip._compat_in_pyodide', 'micropip._micropip', 'micropip.externals', 'micropip.externals.pip', 'micropip.externals.pip._internal', 'micropip.externals.pip._internal.utils', 'micropip.externals.pip._internal.utils.pkg_resources', 'micropip.externals.pip._internal.utils.wheel', 'micropip.externals.pip._vendor', 'micropip.externals.pip._vendor.pkg_resources', 'micropip.package', 'mimetypes', 'ntpath', 'numbers', 'opcode', 'operator', 'os', 'os.path', 'packaging', 'packaging.__about__', 'packaging._manylinux', 'packaging._musllinux', 'packaging._structures', 'packaging.markers', 'packaging.requirements', 'packaging.specifiers', 'packaging.tags', 'packaging.utils', 'packaging.version', 'parso', 'parso._compatibility', 'parso.cache', 'parso.file_io', 'parso.grammar', 'parso.normalizer', 'parso.parser', 'parso.pgen2', 'parso.pgen2.generator', 'parso.pgen2.grammar_parser', 'parso.python', 'parso.python.diff', 'parso.python.errors', 'parso.python.parser', 'parso.python.pep8', 'parso.python.prefix', 'parso.python.token', 'parso.python.tokenize', 'parso.python.tree', 'parso.tree', 'parso.utils', 'pathlib', 'pdb', 'pexpect', 'pickle', 'pickleshare', 'piplite', 'piplite.piplite', 'pkgutil', 'platform', 'posix', 'posixpath', 'pprint', 'profile', 'prompt_toolkit', 'prompt_toolkit.application', 'prompt_toolkit.application.application', 'prompt_toolkit.application.current', 'prompt_toolkit.application.dummy', 'prompt_toolkit.application.run_in_terminal', 'prompt_toolkit.auto_suggest', 'prompt_toolkit.buffer', 'prompt_toolkit.cache', 'prompt_toolkit.clipboard', 'prompt_toolkit.clipboard.base', 'prompt_toolkit.clipboard.in_memory', 'prompt_toolkit.completion', 'prompt_toolkit.completion.base', 'prompt_toolkit.completion.deduplicate', 'prompt_toolkit.completion.filesystem', 'prompt_toolkit.completion.fuzzy_completer', 'prompt_toolkit.completion.nested', 'prompt_toolkit.completion.word_completer', 'prompt_toolkit.cursor_shapes', 'prompt_toolkit.data_structures', 'prompt_toolkit.document', 'prompt_toolkit.enums', 'prompt_toolkit.eventloop', 'prompt_toolkit.eventloop.async_context_manager', 'prompt_toolkit.eventloop.async_generator', 'prompt_toolkit.eventloop.inputhook', 'prompt_toolkit.eventloop.utils', 'prompt_toolkit.filters', 'prompt_toolkit.filters.app', 'prompt_toolkit.filters.base', 'prompt_toolkit.filters.cli', 'prompt_toolkit.filters.utils', 'prompt_toolkit.formatted_text', 'prompt_toolkit.formatted_text.ansi', 'prompt_toolkit.formatted_text.base', 'prompt_toolkit.formatted_text.html', 'prompt_toolkit.formatted_text.pygments', 'prompt_toolkit.formatted_text.utils', 'prompt_toolkit.history', 'prompt_toolkit.input', 'prompt_toolkit.input.ansi_escape_sequences', 'prompt_toolkit.input.base', 'prompt_toolkit.input.defaults', 'prompt_toolkit.input.typeahead', 'prompt_toolkit.input.vt100_parser', 'prompt_toolkit.key_binding', 'prompt_toolkit.key_binding.bindings', 'prompt_toolkit.key_binding.bindings.auto_suggest', 'prompt_toolkit.key_binding.bindings.basic', 'prompt_toolkit.key_binding.bindings.completion', 'prompt_toolkit.key_binding.bindings.cpr', 'prompt_toolkit.key_binding.bindings.emacs', 'prompt_toolkit.key_binding.bindings.focus', 'prompt_toolkit.key_binding.bindings.mouse', 'prompt_toolkit.key_binding.bindings.named_commands', 'prompt_toolkit.key_binding.bindings.open_in_editor', 'prompt_toolkit.key_binding.bindings.page_navigation', 'prompt_toolkit.key_binding.bindings.scroll', 'prompt_toolkit.key_binding.bindings.vi', 'prompt_toolkit.key_binding.defaults', 'prompt_toolkit.key_binding.digraphs', 'prompt_toolkit.key_binding.emacs_state', 'prompt_toolkit.key_binding.key_bindings', 'prompt_toolkit.key_binding.key_processor', 'prompt_toolkit.key_binding.vi_state', 'prompt_toolkit.keys', 'prompt_toolkit.layout', 'prompt_toolkit.layout.containers', 'prompt_toolkit.layout.controls', 'prompt_toolkit.layout.dimension', 'prompt_toolkit.layout.dummy', 'prompt_toolkit.layout.layout', 'prompt_toolkit.layout.margins', 'prompt_toolkit.layout.menus', 'prompt_toolkit.layout.mouse_handlers', 'prompt_toolkit.layout.processors', 'prompt_toolkit.layout.screen', 'prompt_toolkit.layout.scrollable_pane', 'prompt_toolkit.layout.utils', 'prompt_toolkit.lexers', 'prompt_toolkit.lexers.base', 'prompt_toolkit.lexers.pygments', 'prompt_toolkit.mouse_events', 'prompt_toolkit.output', 'prompt_toolkit.output.base', 'prompt_toolkit.output.color_depth', 'prompt_toolkit.output.defaults', 'prompt_toolkit.output.flush_stdout', 'prompt_toolkit.output.plain_text', 'prompt_toolkit.output.vt100', 'prompt_toolkit.patch_stdout', 'prompt_toolkit.renderer', 'prompt_toolkit.search', 'prompt_toolkit.selection', 'prompt_toolkit.shortcuts', 'prompt_toolkit.shortcuts.dialogs', 'prompt_toolkit.shortcuts.progress_bar', 'prompt_toolkit.shortcuts.progress_bar.base', 'prompt_toolkit.shortcuts.progress_bar.formatters', 'prompt_toolkit.shortcuts.prompt', 'prompt_toolkit.shortcuts.utils', 'prompt_toolkit.styles', 'prompt_toolkit.styles.base', 'prompt_toolkit.styles.defaults', 'prompt_toolkit.styles.named_colors', 'prompt_toolkit.styles.pygments', 'prompt_toolkit.styles.style', 'prompt_toolkit.styles.style_transformation', 'prompt_toolkit.utils', 'prompt_toolkit.validation', 'prompt_toolkit.widgets', 'prompt_toolkit.widgets.base', 'prompt_toolkit.widgets.dialogs', 'prompt_toolkit.widgets.menus', 'prompt_toolkit.widgets.toolbars', 'pstats', 'pure_eval', 'pure_eval.core', 'pure_eval.my_getattr_static', 'pure_eval.utils', 'pure_eval.version', 'pydoc', 'pydoc_data', 'pydoc_data.topics', 'pygments', 'pygments.console', 'pygments.filter', 'pygments.filters', 'pygments.formatter', 'pygments.formatters', 'pygments.formatters._mapping', 'pygments.formatters.html', 'pygments.formatters.terminal256', 'pygments.lexer', 'pygments.lexers', 'pygments.lexers._mapping', 'pygments.lexers.python', 'pygments.modeline', 'pygments.plugin', 'pygments.regexopt', 'pygments.style', 'pygments.styles', 'pygments.token', 'pygments.unistring', 'pygments.util', 'pyodide', 'pyodide._core', 'pyodide._package_loader', 'pyodide._state', 'pyodide.code', 'pyodide.ffi', 'pyodide.http', 'pyodide.webloop', 'pyodide_js', 'pyodide_js._api', 'pyolite', 'pyolite.display', 'pyolite.interpreter', 'pyolite.kernel', 'pyolite.litetransform', 'pyolite.mocks', 'pyolite.patches', 'pyparsing', 'pyparsing.actions', 'pyparsing.common', 'pyparsing.core', 'pyparsing.exceptions', 'pyparsing.helpers', 'pyparsing.results', 'pyparsing.testing', 'pyparsing.unicode', 'pyparsing.util', 'queue', 'quopri', 'random', 're', 'reprlib', 'resource', 'runpy', 'select', 'selectors', 'shlex', 'shutil', 'signal', 'site', 'six', 'six.moves', 'socket', 'socketserver', 'sqlite3', 'sqlite3.dbapi2', 'sre_compile', 'sre_constants', 'sre_parse', 'stack_data', 'stack_data.core', 'stack_data.formatting', 'stack_data.serializing', 'stack_data.utils', 'stack_data.version', 'stat', 'string', 'struct', 'subprocess', 'sys', 'sysconfig', 'tarfile', 'tempfile', 'termios', 'textwrap', 'threading', 'time', 'timeit', 'token', 'tokenize', 'tornado', 'tornado.gen', 'traceback', 'traitlets', 'traitlets._version', 'traitlets.config', 'traitlets.config.application', 'traitlets.config.configurable', 'traitlets.config.loader', 'traitlets.traitlets', 'traitlets.utils', 'traitlets.utils.bunch', 'traitlets.utils.decorators', 'traitlets.utils.descriptions', 'traitlets.utils.getargspec', 'traitlets.utils.importstring', 'traitlets.utils.nested_update', 'traitlets.utils.sentinel', 'traitlets.utils.text', 'types', 'typing', 'typing.io', 'typing.re', 'unicodedata', 'unittest', 'unittest.case', 'unittest.loader', 'unittest.main', 'unittest.mock', 'unittest.result', 'unittest.runner', 'unittest.signals', 'unittest.suite', 'unittest.util', 'urllib', 'urllib.error', 'urllib.parse', 'urllib.request', 'urllib.response', 'uu', 'uuid', 'warnings', 'wcwidth', 'wcwidth.table_wide', 'wcwidth.table_zero', 'wcwidth.unicode_versions', 'wcwidth.wcwidth', 'weakref', 'xml', 'xml.dom', 'xml.dom.NodeFilter', 'xml.dom.domreg', 'xml.dom.minicompat', 'xml.dom.minidom', 'xml.dom.xmlbuilder', 'zipfile', 'zipimport', 'zlib']
A standout are the 97 jedi and parso packages: at present, jedi integration is turned off, presumably as it is so slow as to appear broken. At any rate, we're still paying 1.7mb on the wire to download these two packages.
As jedi is part of the pyodide standard distribution it may be rather hacky to work around it getting installed, due to issues like jupyterlite/jupyterlite#904, but may be worth it for doign 10% less work for... nothing.
As
jediis part of the pyodide standard distribution it may be rather hacky to work around it getting installed, due to issues like jupyterlite/jupyterlite#904, but may be worth it for doign 10% less work for... nothing.
I saw a comment like this in jupyterlite/jupyterlite#911 as well, but unless I'm misunderstanding how pyolite and pyodide interact, I don't think a package being part of the 'pyodide standard distribution' means that it gets installed automatically. The only reason things like import matplotlib or import jedi work in the pyodide online console is that, similarly to jupyterlite, it parses the imports and automatically installs standard packages that are being imported:
https://github.com/pyodide/pyodide/blob/97cd5bdc1cf62f4e5e44a305a7682d92b556a1e0/src/py/pyodide/console.py#L478
it parses the imports and automatically installs standard packages that are being imported:
Right, iirc even code like the following would trigger the install?
if False:
import matplotlib
Right: unless it's made optional upstream in IPython (unlikely), the approach we'd need today:
- create a stub package that satisfied the
import jediin IPython- or not, as it's already conditional and ready to fail
- patch pyodide's knowledge of its packages really early
- as this is inside the worker, our
fetchhack wouldn't work
- as this is inside the worker, our
This would theoretically be something we could hoist up to lite plugin settings, e.g.
"jupyter-config-data": {
"litePluginSettings": {
"@jupyterlite/pyolite-kernel-extension:kernel": {
"disablePyPIFallback": true,
"repodata": {
"jedi": null
}
}
}
}
this would then allow piplite to answer the response for a pre-packaged wheel.
Right, iirc even code like the following would trigger the install?
if False: import matplotlib
Yes, I think it purely, statically looks forimport statements and imports everything it finds that's in the standard distribution:
https://github.com/pyodide/pyodide/blob/761b6320ba6a795630c4d44df9f253fd0cd0c562/src/py/_pyodide/_base.py#L514
patch pyodide's knowledge of its packages really early
- as this is inside the worker, our
fetchhack wouldn't work
I think the only reason you install pyodide's version is because you run code through loadPackagesFromImports before running it, no? So you could just remove any package you don't want to install from pyodide from the list to import, like I did here to fix jupyterlite/jupyterlite#904:
https://github.com/jobovy/jupyterlite/compare/quiet-prerun...quiet-prerun-purge-galpy
Right, but 99% of the time, the user wants that feature.
Right, but 99% of the time, the user wants that feature.
Yes, that's why I just remove galpy from the list in my case and still automatically install everything else. Here you could similarly remove jedi from the list of packages to ever install from pyodide and install your shim instead?
Right, except this is for the kernel itself, so there are some timing issues. We further allow folk to use a custom pyodide, and don't/won't ship all 200mb.
Typically if we do some hack like this, we try to make it useful for downstreams as well (this is why we have piplite in the first place, to decouple the names of packages from the typescript code... almost)
While I was thinking this over:
- pyodide's
repodata.jsonhas a critical advantage vs itscondanamesake and the Warehouse JSON API, namely that it includes theimportssupported by the distribution (e.g.pillowprovidesPIL) - we already do "wheel stuff" with as-downloaded wheels with
pkginfoto generate one JSON file - we could just as well generate a pyodide-compatible
repodata.json, and (basically) extract the names inside the packages - this would mean that pre-indexed wheels would not need to be
pip installed, they would get picked up by the auto-import mechanism, without sacrificing debuggability when things fail
This wouldn't change the public API at all, and notebooks would still have to import whatever, but is probably enough of a carrot to look into how we might solve both the jedi patch, while encouraging site owners to provide well-known versions of the wheels their content will use.