rootstock icon indicating copy to clipboard operation
rootstock copied to clipboard

best practice for integrating pandas-generated tables into manubot docs

Open cmungall opened this issue 2 years ago • 5 comments

[feel free to close if this is too open-ended]

I like to have a Jupyter notebook that accompanies every paper I write, and I am always trying to better automate syncing between artefacts generated by the notebook and the paper; in particular data tables.

I like how manubot allows me to use markdown tables as these are easy to auto-generate. It seems I should read up on the pandoc docs on the different table options (I usually just use the basic flavor of markdown)

It looks like there possibilities to do things like auto-include tables that are generated outside:

  • #461

But I feel I'm not enough of a manubot wizard to understand what's happening here. {{ looks like jinja templates are being applied? Where do the variables come from? Apologies if I am missing a guide somewhere...

cmungall avatar May 25 '23 21:05 cmungall

Open ended issues are welcomed and helpful for other users. This is also a fairly custom use case, so we don't have a guide anywhere. I can try to help you set up something that works for your project.

Looking back at the example in #461, you're right that jinja templates are being used. That example manuscript is a very complex project that runs scripts in GitHub Actions and stores a lot of data in JSON files on a separate branch, including these Markdown tables. Check out line 28 of owiddata/owiddata-stats.json of the pull request that added the support for those tables to see an example: https://github.com/greenelab/covid19-review/pull/1104/files#diff-2978568b038ee194710db4ab79813d6dcd7e6647dda2b1c71cfe38558dfddd7c That JSON file and all the variables within are then made accessible to jinja by modifying the Manubot build script and setting the --template-variables-path argument: https://github.com/greenelab/covid19-review/blob/e60f9dbb029ae8708655e748a202b8574454b14a/build/build.sh#L47

Do you have your Jupyter notebook in the same repository as your Manubot manuscripts? If so, you should be able to set up a workflow that roughly:

  • has the notebook export dataframes as Markdown tables and saves then in a JSON file, as suggested in https://github.com/manubot/rootstock/issues/461#issuecomment-1085891149
  • provides that JSON file to manubot process in the build script using --template-variables-path

The first step would be to get it working once. Then we could think about how to automate syncing by exporting the Markdown tables from the notebook on every manuscript build, a schedule, every commit, etc.

agitter avatar May 26 '23 02:05 agitter

I ended up writing my own dataframe to markdown converter (unfortunately pandas to_markdown doesn't support style, for things like lighting max value in a column). My notebook exports this to the ./content/ folder.

I feel I should just be able to

{% include 'my_table.md' %}

but this always results in:

jinja2.exceptions.TemplateNotFound: my_table.md

I will try putting the markdown in the json and rendering this, but it feels a little contorted...

cmungall avatar Jun 02 '23 00:06 cmungall

Using the jinja include would be more elegant. I'm going to reopen this so we can consider whether we should support that in the future.

I'm not familiar with include problems in jinja2. After a quick Stack Overflow search, it looks like the general solution TemplateNotFound is to use a FileSystemLoader so it has visibility to other "templates" (files). If that's correct, it would require changing how the Manubot Python package calls jinja2: https://github.com/manubot/manubot/blob/f62dd4cfdebf67f99f63c9b2e64edeaa591eeb69/manubot/process/util.py#L313

agitter avatar Jun 02 '23 02:06 agitter

That would be great!

I seem to recall doing something similar in the past in a different project; create the loader, pass the environment to the loader, and then load directly from the folder: https://github.com/linkml/linkml/blob/main/linkml/generators/docgen.py#L303-L306

cmungall avatar Jun 02 '23 15:06 cmungall

Using the jinja include would be more elegant

Hmm yeah, a way to insert entire text files, either from a local path or URL, would be a great solution here. So the questions are:

  • do we use jinja include for this?
  • if so, do we apply jinja2.FileSystemLoader by default with a default searchpath directory in a repo
  • or do we let the manubot process command take a list of paths/urls that then get loaded and passed to something like jinja2.DictLoader

dhimmel avatar Jun 03 '23 14:06 dhimmel