MyST-NB Support for code-cell to read code from a file

Is your feature request related to a problem? Please describe.

Within a large lecture series it can be a pretty common pattern to re-use code in different documents. In the past we have used literalinclude to bring in code from a file. As sphinx parses code eventually in the literal_block method it works pretty well.

The issue with myst_nb/jupyter-book is this no longer works when building in this context as the literalinclude is not brought in until sphinx is called. This means the code won't get executed at the jupyter_cache level as the literalinclude is contained in the markdown block when converting using jupytext. The code will be visible in the end HTML but not executed prior to building with sphinx.

Describe the solution you'd like

A solution to this would be to add :file: option to code-cell directives for including code that is meant to be executed.

```{code-cell} python3
:file: _code/code.py
```

Typically file type attributes are usually arguments in sphinx but I think it would be better in this case to retain the simplicity of optional language argument.

The more typical sphinx approach would be

```{code-cell} _code/code.py
```

Describe alternatives you've considered

None -- changing the behaviour of literalinclude to be parsed as code is not a viable solution given out jupytext works.

Additional context

This is currently an issue for migrating the python.quantecon.org series as literalincludes are used pretty extensively in 6 lectures. We could copy the code across these lectures but would prefer not to do that to assist with maintaining the code and consistency across lectures that use the same code.

Oct 30 '20 01:10 mmcky

One issue with the proposal of adding :file: as an option (rather than an argument) is that it becomes harder to support the usual options associated with, for example selecting lines etc. from the file, as the relationship is a bit less clear

https://www.sphinx-doc.org/en/1.5/markup/code.html#directive-literalinclude

Oct 30 '20 01:10 mmcky

I think a feature like this would certainly be useful - though I suspect it might get some pushback due to the fact that it would break the "notebook <--> source file" parity that we have thus far enforced fairly heavily. So I think it's something we'd need to discuss w/ the team

Oct 31 '20 01:10 choldgraf

Yeah -- not sure how to fully solve the "notebook <--> source file" issue.

One possibility is to keep that in tact (which I think is very important to keep two way roundtrips) and add filters/transform concepts to jupytext so jupytext could:

Convert notebook myst <--> ipynb
Apply filters / transforms to support things like or parsing code-cell with a file reference to bring in files

Alternatively perhaps it would be better to add pre-transforms to myst_nb so jupytext remains the same as a two-way conversion tool and we apply filters and transforms prior to calling jupytext.

Nov 02 '20 00:11 mmcky

Another alternative would be to enable myst_parser to interpret code-cell with a file reference (or introduce code-file directive) and read the contents of the file directly on parsing.

Nov 02 '20 00:11 mmcky

@chrisjsewell @choldgraf what do you think of a myst_nb loopback / update option that could update source notebooks once the myst nodes have been parsed by Sphinx and the context is understood for issues like this (and the exercise execution issue).

Perhaps it could work like:

Set option mystnb_update_jupytercache = True
delay usual jupyter-cache run to execute source files
Add a transform to generate a set of source notebooks to unpack any nested code-cell contexts (literalinclude, code-cell read from file feature, exercise execution). Updated notebooks wouldn't update the original sources but some intermediate temp object (such as <sourcedir>/.jupyter-cache/sphinx).
run jupyter-cache -> sphinx -> output pipeline.

That initial run could be pretty useful and would be a very flexible approach. It would also seem relatively easy to debug independently of the usual build pipeline and wouldn't interfere with jupytext.

Note: the loop would only focus on execution. We wouldn't build building notebooks from sphinx.ast

Dec 11 '20 03:12 mmcky

To put it simply; if the file can not be (fully) run by vanilla Jupyter then it is not a Jupyter Notebook and should not have the .ipynb extension. We already have functionality to "pre-convert" custom files extensions to notebooks (https://myst-nb.readthedocs.io/en/latest/examples/custom-formats.html#custom-notebook-formats), this new format should use that

Dec 11 '20 03:12 chrisjsewell

To put it simply; if the file can not be (fully) run by vanilla Jupyter then it is not a Jupyter Notebook and should not have the .ipynb extension.

Perhaps we can bring in code from files using %load in a code-cell and leverage an ipython magic.

```{code-cell} ipython
%load <path>/filename.py
```

I tried this but it doesn't seem that the notebooks are executed in the same <path> as the content.

Dec 11 '20 04:12 mmcky

I tried this but it doesn't seem that the notebooks are executed in the same as the content.

It definitely is, execution_in_temp which is set to False by default was added ages ago (https://github.com/executablebooks/MyST-NB/commit/6346d39babd30aa75277f567d5dd7e20ad6d23fc)

The CWD is the same as the folder of the notebook

Dec 11 '20 04:12 chrisjsewell

@chrisjsewell what happens though when you're using md as the source. Don't they get converted and saved in _build/jupyter_execute?

Ah I think I know what is happening. The %load will alter the input code cell (not the output of the code-cell) -- and I suspect jupyter-cache isn't setup for that kind of change right? The cell also needs to run twice (1 x to run magic and 1 x to execute code)

Dec 11 '20 05:12 mmcky

Maybe we can leverage nbconvert preprocessors/filters to enable cell level transformations before executing the notebook -- through jupyter-cache. If I understand the stack correctly -- myst_nb handles the conversion to notebook via jupytext before calling jupyter-cache to execute. Is that right @chrisjsewell ?

If that is true -- then perhaps we can add a preprocessor level -- using nbconvert in myst_nb to achieve this.

Jan 04 '21 08:01 mmcky

hmmm - I don't believe that we depend on nbconvert currently, do we? I feel like that'd be a heavy-ish dependency to add given that it also duplicates much of the functionality of jupytext, though I could be wrong there.

@mmcky as a start couldn't you add a step to your build process that would:

Grab some text in a .py file (or whatever)
Search all detected files for cells with a particular tag or other piece of metadata
Update the content of those cells with your shared text
Then move forward with building the site

Then updating the shared content would be a matter of:

Updating the shared file
Running a build
Committing the updated content files to the repo

Jan 04 '21 10:01 choldgraf

hmmm - I don't believe that we depend on nbconvert currently, do we?

jupyter-sphinx drags it in. There it is used to write the scripts corresponding to the notebooks (determining extension from the kernel, comment syntax, etc). I don't think it's something you want to implement on your own. And also neither does jupytext.

Jan 04 '21 10:01 akhmerov

hmmm - I don't believe that we depend on nbconvert currently, do we?

Oops sorry I meant nbclient -- I think that is now used for execution in jupyter_cache so was thinking if the pre-execution filters concept from nbconvert may work to apply some cell level transformations (i.e. bringing in the contents of a file) prior to calling jupyter-cache but after jupytext conversion to notebook format:

build ipynb notebook with jupytext
use nbconvert to apply some cell level transformations to add support for things like (read code from file, ipython magics such as %load)
trigger remaining build using jupyter-cache for execution etc ....

@choldgraf are you suggesting I use a custom script to implement a copy/paste type of thing to update the intermediate notebooks? I may do that in the short term -- the complicating factor is that the projects are building from md sources so the ipynb layer is generated by myst_nb :-)

Jan 04 '21 10:01 mmcky

@mmcky yeah that's what I was suggesting - perhaps it could help inform future development on the MyST-NB side? It doesn't seem like it would be too much work to me but I think it'd benefit from experimentation and prototyping to crystallize what potential solutions would look like

Jan 04 '21 10:01 choldgraf

@choldgraf I looked at how this could be supported during the process of converting md files to ipynb and submitted #286 as a prototype for discussion on how to best address this feature.

Jan 05 '21 03:01 mmcky

Hi, I saw a PR was merged so I don't know if this issue is still open for discussion.

If so, I'd be interested to know whether it's also possible to only include certain lines from a file like in Sphinx?

Apr 30 '21 11:04 schmoelder

@schmoelder not at this stage -- as the intention is to include code that executes (via code-cell) so you usually need to bring the whole file in (unless your thinking of brining in a function from a file etc. -- but wouldn't be recommended). If you are just displaying code then you can of course still use the sphinx directives in jupyter-book such as:

```{code-block} ruby
:lineno-start: 10

Some more Ruby code, with line numbering starting at 10.
```

Apr 30 '21 11:04 mmcky

Thanks for the quick reply, that's unfortunate. Actually we really were thinking of bringing functionality from other files in since we would prefer not to maintain many copies of the same code required for setting up simulations in our online tutorials/workshops. But we'll figure something out! :)

Just out of curiosity, why do you consider it a bad idea? And what would be missing to implement it?

Best wishes and thanks for MyST-NB in general!

Apr 30 '21 12:04 schmoelder

thanks @schmoelder my main concern is that code-cell is for bringing in code that needs to be executed from a file. Therefore if you bring in lines from a file -- and then you edit the file -- there is no explicit link between the line numbers and the code that resides in files which will then almost certainly break execution of your book. Also if you edit the code you will need to then find all instances of the specific lines that you have specified in your book

we would prefer not to maintain many copies of the same code required for setting up simulations

This was our use case as well for this feature.

Would either of these approaches work you:

refactor the shared code into a single file that you import? [this is the approach we adopted]
setup a python package (or do you want the setup code to be visible in the book)?

If you have a good use case more than happy to review a PR. This is the current implementation

Apr 30 '21 23:04 mmcky

MyST-NB MyST-NB copied to clipboard

Support for code-cell to read code from a file

MyST-NB
MyST-NB copied to clipboard