lesson-example icon indicating copy to clipboard operation
lesson-example copied to clipboard

Unify lesson pipeline with Reticulate

Open rgaiacs opened this issue 6 years ago • 10 comments

Having _episodes and _episodes_rmd is confusing for all lesson contributors. If you are contributing to one of the R lessons, you probably edited the _episodes files instead of the _episodes_rmd by mistake. And if you are contributing to one of the non R lessons, you asked yourself why they _episodes_rmd exists. This pull request is the begin of a concept idea to unify the lesson pipeline now that R Markdown supports Python so that we can use it.

What is reticulate?

reticulate includes a Python engine for R Markdown and if you are using knitr version 1.18 or higher, then the reticulate Python engine will be enabled by default whenever reticulate is installed and no further setup is required.

Why to add another piece of software to the pipeline?

Because one of our core principles is facilitate research to be reproducible and to achieve it we need to eat our own pet food.

Would be possible to only use GitHub to contribute?

Yes. The idea is to make it easy for people to contribute.

What is missing on the concept idea?

  • [ ] bin/chunk-options.R need to work with reticulate to save figures in the correct place
  • [ ] bin/lesson_initialize.py need to be updated
  • [ ] bin/lesson_check.py need to be updated
  • [ ] bin/repo_check.py need to be updated
  • [ ] Travis need to build the lesson and commit the files

Screenshots

Bash

screencapture-localhost-4000-05-bash-example-index-html-2018-05-13-17_32_29

Python

screencapture-localhost-4000-06-python-example-index-html-2018-05-13-17_32_13

R

screencapture-localhost-4000-07-r-example-index-html-2018-05-13-17_31_36

rgaiacs avatar May 13 '18 16:05 rgaiacs

Do I understand correctly that this would introduce an R dependency to maintaining and testing the build of Python lessons locally? I'm not necessarily against this, just trying to understand the implications of what you propose.

jduckles avatar May 13 '18 20:05 jduckles

Per our discussion on Twitter, I really do like the idea of having one _episode bin per lesson. But I didn't appreciate the point that @jduckles is making about introducing an R dependency. We haven't had major issues on dc-python-ecology with contributors being confused about where to contribute (or, at least, they haven't raised those issues with us), so to me, having another package to be keeping track of might overwhelm my interest in seeing that implemented.

wrightaprilm avatar May 13 '18 22:05 wrightaprilm

What languages will this work for? I like the idea of automatically generating the output, but does it work for MATLAB? Git? Make? SQL? If not then we'll still have different pipelines for some lessons.

Can you snip the output? Some lessons have a cut down version of the output e.g. help or man output in the shell lesson.

gcapes avatar May 14 '18 07:05 gcapes

I like the idea of writing executable lessons. It also opens for converting to R notebooks, and perhaps even Jupyter notebooks from the R Markdown (I do something similar, writing in a markup language and covering to notebooks, including execution). A drawback is that there is more syntax to learn for those that want to contribute...

lexnederbragt avatar May 14 '18 10:05 lexnederbragt

Do I understand correctly that this would introduce an R dependency to maintaining and testing the build of Python lessons locally?

Yes.

I'm not necessarily against this, just trying to understand the implications of what you propose.

Make the code output easy to maintain across different versions of libraries/packages. And make easy for people to contribute from the web interface.

What languages will this work for?

knitr can also execute code in SQL, Rcpp, Stan and JavaScript.

I like the idea of automatically generating the output, but does it work for MATLAB? Git? Make? SQL?

Automatically generating the output of the Git lesson is a bit challenge because every time that we run git commit it will generate a new hash. But I think that we can find one solution.

Usually you call Make from Bash so since we can use Bash we can use Make.

knitr supports SQL but I didn't try it yet in part because we were using the Firefox add-on and I didn't followed the discussion about the replacement very close.

MATLAB is more interesting. I don't think that knitr supports it but even if it supported we would have to get a MATLAB license to automatically generating the output.

If not then we'll still have different pipelines for some lessons.

The only lesson that we couldn't automatically generating the output would be MATLAB and OpenRefine. But OpenRefine is not a text based tool. And implement state persistence between code chunks in RMarkdown isn't easy so add support to MATLAB would take at least weeks of work. Before reticulate, if you want to have two Python code chunks connected in RMarkdown you had to setup some socket-like thing that I couldn't find about anymore.

Can you snip the output? Some lessons have a cut down version of the output e.g. help or man output in the shell lesson.

We could use something like we are using for exercises and solutions, i.e.

```{bash, eval=FALSE}
ls --help
```
```{bash, echo=FALSE}
ls --help | head -n 10
```

rgaiacs avatar May 14 '18 12:05 rgaiacs

I'd like to comment as a biologist-coder-wannabe (the dark-side). Life scientists sometimes struggle to learn a coding language, and this has been exacerbated by the competition between R and python (and Perl before that). This would clearly, and justifiably make it easier for R coders to be maintainers. But it would be an additional challenge for life scientists who want to contribute. I can only speak for myself, but as @jduckles points out, this makes R, essentially, essential to contributing. The intent is good, and yes the maintenance of duplicate episodes requires effort, but even within our local University Carpentries group, there are coders that use either R or Python and aren't comfortable with both. As an old guy and non-coder this affects any contribution or effort to maintain a lesson (i.e requires a reticulate manual and practicing reticulate) before resuming contributions. I personally like the idea, but is it really, really, easy? Could reticulate be developed as a lesson, and made available to the community? It could be inserted into workshops as desired, and meanwhile would allow for motivated coder-wannabe's to learn.

hoytpr avatar May 22 '18 14:05 hoytpr

I think there is a misconception about what using reticulate would mean. Yes, it would make R an essential part of our pipeline to build lessons, but for all the non-R lesson you wouldn't need to know any more R that you currently need to know Ruby to contribute (because we use Jekyll to convert our markdown files into HTML files). There will be no reticulate to learn. Reticulate is the name of the machinery that makes it possible to have python chunks inside Rmarkdown documents.

It would make it easier to contribute to our lessons. People who will be contributing will only need to write the chunks of code and not have to worry about generating the outputs (they will be generated automatically). As a consequence, it will also make our lessons better because the code chunks will be less likely to include bugs and typos (if they did, generating the lessons will create an error), and the code output chunks will always be up to date.

This, in combination with using a continuous integration platform (e.g., Travis CI) for generating and deploying the lessons, will make for a much better experience for people interested in contributing to our lessons.

fmichonneau avatar May 22 '18 14:05 fmichonneau

Thank you @fmichonneau for reaffirming that reticulate is a machinery (what I might call a tool) and not a new language. And I sincerely thank you for recently helping me learn the final steps of using ruby-jekyll-kramdown 'serving' to check my markdown. Using reticulate is a good idea, an obvious choice, and it was not my intention to appear unsupportive. Perspective is important, and lack of understanding (ignorance) about what using reticulate means, is symptomatic of backgrounds for many biologists. My willingness to share this ignorance is intentional. My hope is that reticulate can be implemented with a protocol that is easy to follow to the very end.

hoytpr avatar May 22 '18 16:05 hoytpr

I think what is missing from the above conversation and the screenshots is a "how-to" for users. From what I can gather, a contributor would need to do two things (on top of the > and >> that is already explained in the template with respect to challenges and solutions and such).

  1. Sandwich their their code between ```{<language>} and ``` like so:
```{bash}
# my bash code
```
  1. Add echo = F if you want to suppress the input message or eval = F if you want to suppress the output message. (@gcapes, to the best of my knowledge, I don't know how to cut snippets of output)

If you break it down like this, you can see that this tool is asking people to learn a small amount of RMarkdown syntax in the same way that we ask them to learn how to use Markdown syntax. In my opinion, this is much less daunting that asking someone to learn how to code in R or python.

I do, however, still think it is a bit awkward to mix shell or python with RMarkdown, but I don't know of an alternative.

raynamharris avatar May 24 '18 06:05 raynamharris

@raynamharris Thanks for the comment.

I think what is missing from the above conversation and the screenshots is a "how-to" for users.

Yes. I should have include it. All you said is correct.

rgaiacs avatar May 24 '18 07:05 rgaiacs