mystmd
mystmd copied to clipboard
Executing notebooks in specific order
Proposal
Currently myst build --execute runs executable content asynchronously. However, when notebooks depend on one another, achieving synchronous execution requires hacky solutions, such as making one notebook wait for a specific file created by another.
Also, simultaneous execution can cause issues when multiple resource-intensive notebooks are included, mostly performance bottlenecks.
Would it be possible to introduce a feature in myst.yml to specify the execution order of notebooks? Incorporating memory flushing between notebook executions could greatly enhance efficiency and resource management.
Thank you!
I would be curious if you have a sketch at what some configuration might look like to achieve this?
It could be a part of the TOC such as:
toc:
- file: index.md
- title: Example chapter
children:
- file: intro.md
- file: notebook1.ipynb
execution_order: 0
- file: notebook2.ipynb
execution_order: 1
- file: notebook3.ipynb
execution_order: 0
where notebooks sharing the same order (between 0-99) would be executed simultaneously. OR
toc:
- file: index.md
- title: Example chapter
children:
- file: intro.md
- file: notebook1.ipynb
- file: notebook2.ipynb
depends_on: notebook1.ipynb
- file: notebook3.ipynb
Maybe depends_on better implies that an error in notebook1 will prevent notebook2 from being executed. This would call for a bit more inference, maybe libraries like bee-queue can be of use?
TOC is the first that comes to my mind, yet resources field can be an alternative.
Just giving a heartfelt thumbs up to the first approach, quoted below, although there is a clear use for both proposals.
It could be a part of the TOC such as:
toc:
- file: index.md
- title: Example chapter
children:
- file: intro.md
- file: notebook1.ipynb
execution_order: 0
- file: notebook2.ipynb
execution_order: 1
- file: notebook3.ipynb
execution_order: 0
To give a concrete example supporting the first case, sometimes users (e.g. me right now) might need different "execution groups", in a scenario without any sort of dependency among notebooks. For instance, when notebooks use different and possibly incompatible packages in which an asynchronous execution would cause nondeterministic failures in execution (#2055).