codebraid
codebraid copied to clipboard
Working with multiple source files w/ .cb.nb
I recently experienced an issue with working with multiple source files that would then be combined into one larger document, e.g. multiple files representing book chapters. If the files are set up to run individually with the notebook executor (i.e. .cb.nb
) then execution will fail silently when trying to execute and combine the files into a single document.
Minimal reproducing example
Say you have two source files ch1.md
and ch2.md
that you want to execute+compile into book.pdf
:
Contents of ch1.md
:
# Ch. 1 - Uniform distribution
A histogram of uniformly-distributed random numbers.
```{.python .cb.nb jupyter_kernel=python3}
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
plt.hist(rng.uniform(size=1000))
```
Contents of ch2.md
# Ch 2. - Normal Distribution
A histogram of normally-distributed random numbers.
```{.python .cb.nb jupyter_kernel=python3}
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
plt.hist(rng.standard_normal(size=1000))
```
Executing/converting the files individually works as expected:
$ codebraid pandoc --from markdown --to pdf ch1.md --standalone -o book.pdf
However, if you try to compile both documents into a single book, neither document is executed, though no warning or error are given on the command line:
$ codebraid pandoc --from markdown --to pdf ch1.md ch2.md --standalone -o book.pdf
In the latter case, if you look at the output book.pdf
you will find an error printed:
SOURCE ERROR in "ch2.md" near line 6:
Some options are only valid for the first code chunk in a session: "jupyter_kernel"
IMO it would be helpful to the user if this error were raised at the command line rather than (or in addition to being) embedded in the output document. In my actual use-case with much larger chapters, it was a very long time before I noticed this in the output book.
The error in book.pdf
seems to suggest that the problem lies with the "special" metadata jupyter_kernel
, which is only supposed to be supplied in the first code cell. This suggests that an author would have to modify source file metadata if they wanted to switch between building individual chapters and the entire book. I hadn't noticed this mentioned in the docs before - if it's not there, then it would be an improvement if this behavior were documented.
Perhaps this can be avoided if .cb.run
is used instead of .cb.nb
? Is there a preferred way of using codebraid to have flexible outputs w/ multiple source files?
I need to clarify the documentation on this. By default, when you pass Pandoc multiple files, it treats them all as one. Codebraid does the same thing, so the code from multiple files is treated as all being from one file, and thus all being in one session. Hence the error about first code cell config in the wrong place.
Pandoc has a --file-scope
option that treats multiple files as individuals, and then merges the results after parsing, This should cause Codebraid to do the same thing. The test files work with --file-scope
. Of course, that means that you can't have shared Markdown between files (things like footnote definitions, etc.). I have an existing way to enable the effects of --file-scope
for Codebraid even when it is disabled for Pandoc, but just haven't made it available to users yet...let me know if you need that.
In terms of better errors: There's #24 for adding exit codes, and I'm referencing that here to remind myself to look into more extensive error messages on the command line as well.
Pandoc has a --file-scope option
Thanks, I wasn't aware of this option.
I have an existing way to enable the effects of --file-scope for Codebraid even when it is disabled for Pandoc, but just haven't made it available to users yet...let me know if you need that.
I'm not sure yet if it's necessary - at this stage it seems there's enough flexibility to put together a sensible workflow without this feature, but I'll keep it in mind as I continue experimenting with multiple files.
In terms of better errors: There's #24 for adding exit codes, and I'm referencing that here to remind myself to look into more extensive error messages on the command line as well.
:+1: