mystmd icon indicating copy to clipboard operation
mystmd copied to clipboard

Executing notebooks in specific order

Open agahkarakuzu opened this issue 10 months ago • 2 comments
trafficstars

Proposal

Currently myst build --execute runs executable content asynchronously. However, when notebooks depend on one another, achieving synchronous execution requires hacky solutions, such as making one notebook wait for a specific file created by another.

Also, simultaneous execution can cause issues when multiple resource-intensive notebooks are included, mostly performance bottlenecks.

Would it be possible to introduce a feature in myst.yml to specify the execution order of notebooks? Incorporating memory flushing between notebook executions could greatly enhance efficiency and resource management.

Thank you!

agahkarakuzu avatar Jan 21 '25 19:01 agahkarakuzu

I would be curious if you have a sketch at what some configuration might look like to achieve this?

rowanc1 avatar Jan 23 '25 22:01 rowanc1

It could be a part of the TOC such as:

 toc:
    - file: index.md
    - title: Example chapter
      children:
        - file: intro.md
        - file: notebook1.ipynb
          execution_order: 0
        - file: notebook2.ipynb
          execution_order: 1
        - file: notebook3.ipynb
          execution_order: 0
         

where notebooks sharing the same order (between 0-99) would be executed simultaneously. OR

 toc:
    - file: index.md
    - title: Example chapter
      children:
        - file: intro.md
        - file: notebook1.ipynb
        - file: notebook2.ipynb
          depends_on: notebook1.ipynb
        - file: notebook3.ipynb         

Maybe depends_on better implies that an error in notebook1 will prevent notebook2 from being executed. This would call for a bit more inference, maybe libraries like bee-queue can be of use?

TOC is the first that comes to my mind, yet resources field can be an alternative.

agahkarakuzu avatar Feb 05 '25 15:02 agahkarakuzu

Just giving a heartfelt thumbs up to the first approach, quoted below, although there is a clear use for both proposals.

It could be a part of the TOC such as:

 toc:
    - file: index.md
    - title: Example chapter
      children:
        - file: intro.md
        - file: notebook1.ipynb
          execution_order: 0
        - file: notebook2.ipynb
          execution_order: 1
        - file: notebook3.ipynb
          execution_order: 0
         

To give a concrete example supporting the first case, sometimes users (e.g. me right now) might need different "execution groups", in a scenario without any sort of dependency among notebooks. For instance, when notebooks use different and possibly incompatible packages in which an asynchronous execution would cause nondeterministic failures in execution (#2055).

mmmarinho avatar May 23 '25 09:05 mmmarinho