rootstock icon indicating copy to clipboard operation
rootstock copied to clipboard

Rootstock clone or Manubot within an analysis Repo?

Open mdkessler opened this issue 5 years ago • 5 comments

When setting out to write up results from a project with code in its own repo, is it better to start a fresh repo specific to such a manuscript? Or is it recommended (or is there a way) to use ManuBot within the project repo itself? If the latter, what would be the best way to do it?

I asked @cgreene the above question, and he suggested I post it as an issue here. His answer is reproduced below. If any has any other thoughts, that would be great as well.

Thanks!

@cgreene's answer

If it's primarily writing with a few side analyses (especially those that you want to automatically update), I think you might use this as an example: https://github.com/greenelab/covid19-review

If it's a pretty hefty set of analyses, potentially spread across multiple repos, this might be better as an example (i.e., linking out to other figs on github): https://github.com/greenelab/iscb-diversity-manuscript

There are a lot of options though - a github issue on rootstock is likely to get you the broadest set of answers and should let others see what you come up with afterwards :)

mdkessler avatar Dec 18 '20 19:12 mdkessler

When setting out to write up results from a project with code in its own repo, is it better to start a fresh repo specific to such a manuscript?

Yes it is best to use a new repository with the instructions in SETUP.md. If there is a one-to-one correspondence to the analysis repo, there is a soft convention to append -manuscript to the analysis repo name. For example, if you have your analysis in my-research, you could name the manuscript repo my-research-manuscript.

If your analysis repo is public and produces figures for the manuscript, you can include them with versioned links like:

![
**A square image at actual size and with a bottom caption.**
Loaded from the latest version of image on GitHub.
](https://github.com/manubot/resources/raw/15493970f8882fce22bef829619d3fb37a613ba5/test/square.png "Square image"){#fig:square-image}

By using a commit hash, you link your manuscript to a specific version of the analysis, which I find is best practice.

Or is it recommended (or is there a way) to use ManuBot within the project repo itself?

There is no recommended way. If you try to add Manubot to an existing analysis repo, some things might work but other things will break. In the future we'd like to make Manubot more relaxed such that it can be added to existing repos.

Once you have a manubot manuscript, you can add analysis files in it. Manubot will be unaffected by extra directories. But for most projects I find it cleaner to have an analysis repo and a manuscript repo.

You also could add your existing analysis to the manuscript repo using a distinct non-default branch. But I think that'd be confusing to viewers.

Is the main thing here the desire to keep everything consolidated?

dhimmel avatar Dec 18 '20 23:12 dhimmel

Yes, I was just wondering about consolidation, but given your answer, I'm thinking that a separate manuscript repo (using the convention you described) is the best way to go.

And thanks for the really helpful answer.

mdkessler avatar Dec 19 '20 00:12 mdkessler

Also - could you possibly clarify one more thing?

By using a commit hash, you link your manuscript to a specific version of the analysis, which I find is best practice.

What exactly do you mean by this? The syntax and format demonstrated above for including a figure from a specific repo makes sense...will that update the figure automatically whenever you regenerate it? Is that what's meant by the "commit hash"?

Thanks for clarifying - I know its a super basic thing for long term git users.

mdkessler avatar Dec 19 '20 00:12 mdkessler

Using a URL that includes the commit hash like https://raw.githubusercontent.com/manubot/resources/15493970f8882fce22bef829619d3fb37a613ba5/test/square.png locks the figure or other analysis output to a specific version. It will intentionally prevent auto-updates to that figure or result until you modify the URL and update the commit hash to a new version in your manuscript text. Pinning the version in this manner can help keep your analysis results and manuscript text in sync. Otherwise you could have text saying "observe trend XYZ in Figure 1" but after the figure auto-updates trend XYZ is no longer there.

Using a URL like https://raw.githubusercontent.com/manubot/resources/master/test/square.png is alternative syntax to including a commit hash. This figure would automatically update and always display the latest version. There could be cases where you want that behavior, but we generally don't recommend it.

Another useful feature when pairing a Manubot manuscript with an analysis repository is using template variables. Your analysis code can store statistics or results in a JSON file. Then Manubot can populate a template variable in the manuscript with the actual values. An example from https://greenelab.github.io/covid19-review/

For instance, the template variable {{ebm_trials_results}} in the manuscript is replaced by the actual number of clinical trials with results, 157.

agitter avatar Dec 19 '20 14:12 agitter

Ohh...got it. I figured auto-updates as analysis is updated would be a great convenience feature, but that makes a lot of sense as far as the potential for problems due to disconnection between update and text. Thanks!! Thats super helpful.

mdkessler avatar Dec 19 '20 14:12 mdkessler