MyST-NB Consider an alternative implementation of glue

While, I like the simplicity of glue's implementation, I can see several tradeoffs:

The auxiliary code (nb_glue.glue(variable)) is not aimed at the readers of the materials, but rather is markup, however it would be exposed in thebelab.
The implementation, although being rather lightweight, is language-specific, and requires a reimplementation in all kernels.
An undisplayable mime type achieved via the use of a prefix seems to be a bit of a hack.

An alternative implementation of the same functionality would be mechanism for labeling specific cells via cell metadata, and a role for inserting the output of a cell by reference. Cell labels would have other uses, see also the discussion in #64. Naturally the two implementations are not mutually exclusive.

Possible caveats:

Specifying cell labels would be easier to use outside of the notebook environment, within jupyterlab/classic notebook it would be more cumbersome.
The granularity of what output is glued is reduced, so that if a cell has multiple outputs, the author wouldn't be able to choose as easily the one they want to glue. However since the author has freedom in defining the cells, this seems to not be a big problem.
Within the notebook environment cell metadata is duplicated when the cell is copied, potentially leading to duplicate labels. This could be addressed by emitting a sphinx warning or an error.

Mar 22 '20 14:03 akhmerov

That's a good point re: re-implementing per kernel, and I also agree it can be confusing to have "code that is doing markup stuff" embedded with the analysis code.

From an implementation standpoint, I think it would be pretty straightforward to, e.g., use a glue_ tag prefix for something like this. Basically if a cell had a glue_<ID> tag, then at parse time, <ID> would be added to the output metadata similar to what we do now.

Some downsides I could see with that:

As @akhmerov mentions, I think that it would become confusing for multiple outputs, which we'd ideally want tagged by name rather than by sequential order. In many cases (e.g. plots) there will be only one output per cell, but consider this example from the docs, which I think is quite common.

In that case, you calculate three statistics in a single cell (mean, and confidence interval). You have 3 numbers that you want to insert into your document later. Moreover, you may not actually want to display those numbers at all, just store them.

Would the user need to explicitly call "print( on each variable? That seems a bit unintuitive
Would the outputs be stored sequentially in the order they were printed? If so, how would they refer to output number in the directive/role? (e.g. we want people to be able to write markup like: {glue}`mymean`, not {glue}`my_statistics_cell` and require them to remember whether they printed the lower or the upper bound of the confidence interval first
One might say "just put each variable in a different cell and print it, assigning different tags to each cell", but I think that would be a pretty big ask for users and would break up the natural flow of their analysis

So a final thought re: mutually exclusive, we could implement a simplified version of "gluing" that uses tags, and makes the assumption that there is one output per cell. In this case, we could call it a lightweight solution, and if you want something fancier then you need to use a language-specific library (like the current "glue" library)

ps: on the point of it being strange to use a prefix in the display outputs to keep track of this, one reason for doing so is to standardize with what scrapbook does. glue basically re-implements a much more lightweight version of the scrapbook glueing mechanism.

Mar 22 '20 15:03 choldgraf

being strange to use a prefix in the display outputs to keep track of this

I thought that it's the output metadata that is used to glue, while forming an unrecognizable mimetype by adding a prefix (behavior with display=False) is used to hide the output.

Looking at the spec, using tags for labels seems to be unconventional ("tags" sounds like something reusable and not unique). At the same time, there's an already defined name official metadata field of a cell. For simplicity, the execution mechanism of ebp could specify some default numerical names for all cells.

Mar 22 '20 16:03 akhmerov

Good point re: name, I agree that’s a better metadata field to use, though I don’t know of any UI that makes it easy to name cells which is why I was defaulting to tags. I agree that text-based notebooks could easily support this though!

Mar 22 '20 16:03 choldgraf

Indeed, a cell name has to be added via the raw metadata editor. Is incorporating cell name in the metadata editor perhaps a worthy proposal to make to the jupyterlab team?

Keeping in mind the different kinds of users, I imagine the following usage patterns:

Beginning user (dumping a notebook collection to make an EBP):
Will probably not use glue
Intermediate (notebooks automatically executed and version controlled):
Can rely on automatically assigned cell names.
Advanced (going beyond markdown syntax):
Likely uses the raw format, can specify the name on their own.

Mar 22 '20 16:03 akhmerov