MyST-NB
MyST-NB copied to clipboard
Add inline short-hand for `glue:any` role
Description
In RMarkdown, they have a short-hand for inserting the values of r
code inline into the document: `r somevariable`
. Right now, we'd accomplish the same thing with {glue:}`somekey`
(after glueing it into the notebook).
I wonder if it would be helpful to think up a similar short-hand for variable insertion with MyST-NB. Some random ideas:
-
{{ somekey }}
-
`j somekey`
(j for jupyter) -
`g somekey`
(g for glue) -
{g}`somekey`
Benefit
This is an extremely commonly-requested feature in the Jupyter ecosystem, so it seems there is a large community of people that want this, particularly for scientific writing. For example, see these posts on SO, shared by @matthew-brett:
- https://stackoverflow.com/questions/21808642/inline-python-in-markdown-with-ipython-notebook
- https://stackoverflow.com/questions/21364102/expand-variables-in-markdown-cells-of-ipython-notebook
- https://stackoverflow.com/questions/18878083/can-i-use-variables-on-an-ipython-notebook-markup-cell
and this long-standing IPython issue where it is discussed:
https://github.com/ipython/ipython/issues/2958
Implementation
I think we can split this into three different questions, and each could be tackled separately:
- Given the current Glue infrastructure, define a shorthand for
glue:any
. I think this could be resolved relatively quickly using MyST substitutions, as described here - Allow for substitutions that didn't require a
glue
function to be called first - this would require collecting variables when the notebooks are run somehow, and would probably be a bigger amount of work. - More broadly, how to substitute variables at run-time from within the kernel. This would be a much more significant re-write of how the execution logic works, and would also break from how Jupyter does execution.
cc @stefanv who mentioned this earlier
Would this be using glue keys or actually variables? The former has a disadvantage compared to R because of referring to a different data abstraction. If the latter, would expressions be also OK?
Yes - I was going to say the same as @akhmerov - the R version really hits the transparency sweet-spot:
```{r}
# Some calculation
a <- 1
```
The value of `a` is `r a`.
This is so transparent that you can leave this markup in the student's notebook with the reasonable hope that the student will immediately see what is going on.
This isn't as true of the Glue syntax:
```{python}
from myst_nb import glue
# Some calculation
a = 1
glue('a', a)
```
The value of `a` is {glue:}`a`.
We need to first Glue and then paste, which requires explanation for someone who can see the markup.
Is it practical to make something similar to the inline r version, that has access by default to notebook variables, without explicit Gluing, and can evaluate code?
I agree that it would be much simpler if we found a way to let people insert variables into their documents that both:
- Wasn't language-specific (
glue()
is a python function) - Didn't require extra code in the code cells that wasn't related to running analyses etc
I'm trying to wrap my head around how we'd technically be able to do this. Just spitballing some ideas here:
- We run the notebook top-to-bottom as a part of the Sphinx parsing process. At that point we also have the content of the markdown cells
- As we encounter glue syntax in the markdown, we assume that it points to a variable Perhaps we could parse the markdown for any "glue" directives or roles, and if one is found, call
display()
on that variable and store the result for injection into the markdown later (e.g. we can update the content of the role/directive so it points to a programmatically-generated key name) - All of this would apply only to same-page documents, I think we still need an explicit "store this key/val" step for pasting between pages (though perhaps we could get away with using notebook metadata for this? e.g. you could have
glue_variables: ["list", "of", "variable", "names"] and when the notebook was run, we'd call
display` on the state of those variables after executing the notebook and store the results in the notebook.
Running the notebook top-to-bottom before substituting inline expressions would use the latest available values of the variables that were mutated, once again deviating from the r abstraction (and the notebook abstraction itself).
Is it correct that the main design limitation here is the need to produce a jupyter notebook that has the same execution outcome? (otherwise inline executable code would be sufficient, it seems)
note that in my above comment we don't have to run the notebook top-to-bottom first, if we are able to run it cell-by-cell and inspect the markdown in between as we do so. However I think doing this would require a fairly large change in how we execute notebooks since (to my knowledge) no other jupyter infrastructure supports this
I must admit I started typing and wiped everything several times because I wasn't sure about responsibilities and guarantees of each project. Let me see if I got it right.
- MyST-NB is built to use jupyter-cache for execution of notebooks?
- Is the output of execution another notebook?
- What guarantees should that other notebooks satisfy? This, I think is the most interesting part. Broadly speaking "the notebook should be similar to just being directly run", but is that done now? Is that a design goal?
- Exactly how compatible does MyST-NB aim to be with jupyter? Rendering MyST is clearly not something Jupyter is able to do right now, so 100% compatibility doesn't seem to be a goal.
I'm going to assume that the answers are "yes", "yes", "mostly", and "mostly". If that is the case, I imagine a reasonable compromise would be to treat inline executable code as code for the purposes of what gets passed to the jupyter-cache, and store the outputs as markdown cell attachments.
This has a drawback that MyST-NB would potentially produce a different execution result than Jupyter if someone glue
s in a mutation in their markdown code. On the other hand, hopefully most authors would be reasonable enough to not do this.
The notebook also doesn't have to implement all the MyST features. If I saw markup such as
the value of N is =N=
in a notebook, I'd know what it means, and I'd presume it is meant for some publishing tool to render. It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.
It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.
I would point out this is the direct opposite of what a lot of people/authors have been requesting. They want the notebook to basically fully render in the notebook.
if we are able to run it cell-by-cell and inspect the markdown in between as we do so
No we don't. That goes against the whole design philosophy of jupyter-cache, it doesn't even store the markdown. The whole point of it is that notebooks only need to be re-executed when code changes, not markdown, so you are not having to constantly re-execute the notebook, when you are only changing the text. I'm surprised @choldgraf and @akhmerov don't remember this, since we had quite lengthy conversations about it lol 😉
Is it practical to make something similar to the inline r version, that has access by default to notebook variables, without explicit Gluing, and can evaluate code?
Unfortunately no, this is just not going to happen; at least in the near term. The only reason that RMarkdown can do this is that they have built bespoke execution engines. Also, as discussed in their documentation, https://bookdown.org/yihui/rmarkdown/language-engines.html, this feature is only available for r
, python
and julia
languages.
It feels restrictive to tie the format to what the notebook can currently render, instead of thinking about what authors would want ideally.
I would point out this is the direct opposite of what a lot of people/authors have been requesting. They want the notebook to basically fully render in the notebook.
Sure, but the question is whether the notebook should be able to do it already, or whether the notebook could learn to do it in the future.
I think "in the future" are the operative words there lol. I would push for changes in those packages first. Then we can re-assess when/if these render capabilities are available in the notebook.
No we don't. That goes against the whole design philosophy of jupyter-cache, it doesn't even store the markdown. The whole point of it is that notebooks only need to be re-executed when code changes, not markdown, so you are not having to constantly re-execute the notebook, when you are only changing the text. I'm surprised @choldgraf and @akhmerov don't remember this, since we had quite lengthy conversations about it lol wink
I remember! (Although vaguely, since it happened in what feels like one of the previous epochs). Still there's no requirement that jupyter-cache gets as input from MyST-NB the same notebook what MyST-NB sees. It's up to MyST-NB to inject a code cell per inline executable code role into what it sends to jupyter-cache.
it happened in what feels like one of the previous epochs
Before the apocalypse lol
to inject a code cell per inline executable code role
How do you structure a notebook to have inline executables? e.g. what if there is an inline executable in the middle of a list
- abc `r x` efg
Here does this translate to a notebook?
I'm not saying it can't be done, but it would require an entire re-write of the current code; just to incorporate (at least at this stage) a "nice to have" feature.
I think there are several notebooks in question here. The code example you showed would be inside a markdown cell in the initial notebook, then x
would be in a code cell for jupyter-cache.
For example I imagine MyST-NB could follow these steps:
- parse the notebook
- insert a code cell with contents
x
before the markdown cell when sending the notebook do jupyter-cache - take the jupyter-cache evaluation result
- extract the outputs of the cells preceding the markdown cell in question
- add them as attachments to the markdown cell
- convert everything to sphinx AST.
Just a quick note here - I think we should set the context for this conversation as "wouldn't it be great if", rather than "let's implement this now". Sorry if I didn't make it clear before, but I agree w/ @chrisjsewell that this would require a lot of re-writing for how execution happens. I just want the conversation to be expansive and creative - but is very much a long-term kind of conversation
For example I imagine MyST-NB could follow these steps:
- parse the notebook
Already at step (1) this in a divergence from what myst-nb currently does: the notebook doesn't get parsed until after it has been retrieved from jupyter-cache.
This has been a requested feature in Jupyter notebooks for many years - here's a recent thread that refers to previous discussions, probably here and here. It's also a popular question on SO:
Just a note that there was actually a "classic notebook" extension that did this: https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/6af8e5e84e4746476c5b476b7e38f63d7abb2064/src/jupyter_contrib_nbextensions/nbextensions/python-markdown
I agree wholeheartedly that this would be an awesome feature within Jupyter, but we should be realistic that right now we don't have the connections to the JupyterLab world, nor the developer resources, to actually implement this. It's something we can advocate for and try to nudge in a direction, but would be non-trivial to figure out.
I had some thoughts on how we could use notebook-level metadata to let users define which variables they want to "glue" into the notebook - took it to a different issue though, so check it out here for discussion: https://github.com/executablebooks/MyST-NB/issues/188
Just a quick update here. I think there could be an easy step forward to make an iterative improvement, even though it wouldn't solve the whole problem.
Since MyST now supports markdown substitutions as an optional extension, we could piggy-back to support in-line variables with {{ myvar }}
.
Here is where we update the "glue variable dictionary":
https://github.com/executablebooks/MyST-NB/blob/master/myst_nb/parser.py#L86-L89
Around there, we could check for whether the substitutions
extension is loaded, and if so, could write a function that also updates that environment variable. I believe that the thing we'd need to update would be self.config.myst_substitutions
. Here's where that config is referenced when rendering a substitution:
https://github.com/executablebooks/MyST-Parser/blob/master/myst_parser/docutils_renderer.py#L1075-L1079