Convert a .Rmd notebook that contains both R and python chunks to an .R script with py_run_string for the python lines
This feature request is the complement of issue https://github.com/yihui/knitr/issues/1773 and also has been submitted as a SO question.
I would like to convert an R Markdown notebook that contains both R and python chunks to an R script for execution on a backend server. We use a python pipeline to prepare the data. R code continues the analysis. The R markdown notebook comes from someone else and might be updated in the future. It would be nice if we can convert the notebook automatically to an R script. We don't necessarily need the notebook output, we are more interested in the data processing done in R chunks. And an R script is a little bit easier to use for debugging.
Input notebook analysis.Rmd
---
title: "The Ultimate Question"
---
```{r setup}
library(reticulate)
```
```{python}
import pandas
df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
```
```{r}
str(py$df)
prod(py$df$x)
```
I tried converting it to .R with
knitr::purl("analysis.Rmd")
But the resulting analysis.R file simply comments out the python lines
## ----setup--------------------------------------------------------------------
library(reticulate)
## import pandas
## df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)
Expected result
## ----setup--------------------------------------------------------------------
library(reticulate)
py_run_string("import pandas")
py_run_string("df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})")
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)
By filing an issue to this repo, I promise that
- [x] I have fully read the issue guide at https://yihui.org/issue/.
- [x] I have provided the necessary information about my issue.
- If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
- If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included
xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version:remotes::install_github('yihui/knitr'). - If I have posted the same issue elsewhere, I have also mentioned it in this issue.
- [x] I have learned the Github Markdown syntax, and formatted my issue correctly.
I understand that my issue may be closed if I don't fulfill my promises.
This is definitely a reasonable feature request. The current behavior (commenting out chunks that are not R) is certainly suboptimal. I have hoped to improve it but have also had a few considerations:
- If we do this for python code chunks, we probably should do the same thing for other code chunks. The former is relatively simple. The latter is a non-trivial task. But I guess improving the python support would be a great step forward, so it's worth doing.
- There is a possible special case: the whole document consists of pure python code chunks. In that case, I guess it may be preferable to create a pure python script rather than using reticulate to run python code.
- Would it be a better idea to write out these code chunks out as separate scripts and run them with
reticulate::source_python(), instead of inlining the code inpy_run_string()?
Would it be a better idea to write out these code chunks out as separate scripts and run them with reticulate::source_python(), instead of inlining the code in py_run_string()?
For sure this is a better idea for scripts that go beyond a few lines of code. In our case the python chunks have 2 to 5 lines of code in general, and consist of loading python packages and selecting data for a specific product or a specific country in a database interface + aggregating data with a python function. Separating those few lines in another python script is definitively possible, but it would be nice to keep the few data selection steps together with the rest of the analysis. In fact our current work around will be to ask the author of the notebook to convert his python chunks to R chunks that make 2 to 5 calls to py_run_string() inside them.
2. There is a possible special case: the whole document consists of pure python code chunks. In that case, I guess it may be preferable to create a pure python script rather than using reticulate to run python code.
@yihui , what I most need is this item, it would be very helpful to have at least that working, and probably it would be the simplest for you to program, right?
@GitHunter0 Yes, this case should be relatively simple to implement.
@GitHunter0 @yihui I have a made a specific issue to track this idea as a single item.
Edit: it was in fact already a feature request in https://github.com/yihui/knitr/issues/1928