best practice for integrating pandas-generated tables into manubot docs
[feel free to close if this is too open-ended]
I like to have a Jupyter notebook that accompanies every paper I write, and I am always trying to better automate syncing between artefacts generated by the notebook and the paper; in particular data tables.
I like how manubot allows me to use markdown tables as these are easy to auto-generate. It seems I should read up on the pandoc docs on the different table options (I usually just use the basic flavor of markdown)
It looks like there possibilities to do things like auto-include tables that are generated outside:
- #461
But I feel I'm not enough of a manubot wizard to understand what's happening here. {{ looks like jinja templates are being applied? Where do the variables come from? Apologies if I am missing a guide somewhere...
Open ended issues are welcomed and helpful for other users. This is also a fairly custom use case, so we don't have a guide anywhere. I can try to help you set up something that works for your project.
Looking back at the example in #461, you're right that jinja templates are being used. That example manuscript is a very complex project that runs scripts in GitHub Actions and stores a lot of data in JSON files on a separate branch, including these Markdown tables. Check out line 28 of owiddata/owiddata-stats.json of the pull request that added the support for those tables to see an example: https://github.com/greenelab/covid19-review/pull/1104/files#diff-2978568b038ee194710db4ab79813d6dcd7e6647dda2b1c71cfe38558dfddd7c That JSON file and all the variables within are then made accessible to jinja by modifying the Manubot build script and setting the --template-variables-path argument: https://github.com/greenelab/covid19-review/blob/e60f9dbb029ae8708655e748a202b8574454b14a/build/build.sh#L47
Do you have your Jupyter notebook in the same repository as your Manubot manuscripts? If so, you should be able to set up a workflow that roughly:
- has the notebook export dataframes as Markdown tables and saves then in a JSON file, as suggested in https://github.com/manubot/rootstock/issues/461#issuecomment-1085891149
- provides that JSON file to
manubot processin the build script using--template-variables-path
The first step would be to get it working once. Then we could think about how to automate syncing by exporting the Markdown tables from the notebook on every manuscript build, a schedule, every commit, etc.
I ended up writing my own dataframe to markdown converter (unfortunately pandas to_markdown doesn't support style, for things like lighting max value in a column). My notebook exports this to the ./content/ folder.
I feel I should just be able to
{% include 'my_table.md' %}
but this always results in:
jinja2.exceptions.TemplateNotFound: my_table.md
I will try putting the markdown in the json and rendering this, but it feels a little contorted...
Using the jinja include would be more elegant. I'm going to reopen this so we can consider whether we should support that in the future.
I'm not familiar with include problems in jinja2. After a quick Stack Overflow search, it looks like the general solution TemplateNotFound is to use a FileSystemLoader so it has visibility to other "templates" (files). If that's correct, it would require changing how the Manubot Python package calls jinja2: https://github.com/manubot/manubot/blob/f62dd4cfdebf67f99f63c9b2e64edeaa591eeb69/manubot/process/util.py#L313
That would be great!
I seem to recall doing something similar in the past in a different project; create the loader, pass the environment to the loader, and then load directly from the folder: https://github.com/linkml/linkml/blob/main/linkml/generators/docgen.py#L303-L306
Using the jinja
includewould be more elegant
Hmm yeah, a way to insert entire text files, either from a local path or URL, would be a great solution here. So the questions are:
- do we use jinja include for this?
- if so, do we apply
jinja2.FileSystemLoaderby default with a defaultsearchpathdirectory in a repo - or do we let the
manubot processcommand take a list of paths/urls that then get loaded and passed to something likejinja2.DictLoader