knitr Convert a .Rmd notebook that contains both R and python chunks to an .R script with py_run

This feature request is the complement of issue https://github.com/yihui/knitr/issues/1773 and also has been submitted as a SO question.

I would like to convert an R Markdown notebook that contains both R and python chunks to an R script for execution on a backend server. We use a python pipeline to prepare the data. R code continues the analysis. The R markdown notebook comes from someone else and might be updated in the future. It would be nice if we can convert the notebook automatically to an R script. We don't necessarily need the notebook output, we are more interested in the data processing done in R chunks. And an R script is a little bit easier to use for debugging.

Input notebook analysis.Rmd

---
title: "The Ultimate Question"
---

```{r setup}
library(reticulate)
```
    
```{python}
import pandas
df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
```
    
```{r}
str(py$df)
prod(py$df$x)
```

I tried converting it to .R with

knitr::purl("analysis.Rmd")

But the resulting analysis.R file simply comments out the python lines

## ----setup--------------------------------------------------------------------
library(reticulate)
    
## import pandas
## df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})
    
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)

Expected result

## ----setup--------------------------------------------------------------------
library(reticulate)
    
py_run_string("import pandas")
py_run_string("df = pandas.DataFrame({'x':[2,3,7], 'y':['life','universe','everything']})")
    
## -----------------------------------------------------------------------------
str(py$df)
prod(py$df$x)

By filing an issue to this repo, I promise that

[x] I have fully read the issue guide at https://yihui.org/issue/.
[x] I have provided the necessary information about my issue.
- If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
- If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
- If I have posted the same issue elsewhere, I have also mentioned it in this issue.
[x] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

Nov 10 '22 16:11 paulrougieux

This is definitely a reasonable feature request. The current behavior (commenting out chunks that are not R) is certainly suboptimal. I have hoped to improve it but have also had a few considerations:

If we do this for python code chunks, we probably should do the same thing for other code chunks. The former is relatively simple. The latter is a non-trivial task. But I guess improving the python support would be a great step forward, so it's worth doing.
There is a possible special case: the whole document consists of pure python code chunks. In that case, I guess it may be preferable to create a pure python script rather than using reticulate to run python code.
Would it be a better idea to write out these code chunks out as separate scripts and run them with reticulate::source_python(), instead of inlining the code in py_run_string()?

Nov 11 '22 14:11 yihui

Would it be a better idea to write out these code chunks out as separate scripts and run them with reticulate::source_python(), instead of inlining the code in py_run_string()?

For sure this is a better idea for scripts that go beyond a few lines of code. In our case the python chunks have 2 to 5 lines of code in general, and consist of loading python packages and selecting data for a specific product or a specific country in a database interface + aggregating data with a python function. Separating those few lines in another python script is definitively possible, but it would be nice to keep the few data selection steps together with the rest of the analysis. In fact our current work around will be to ask the author of the notebook to convert his python chunks to R chunks that make 2 to 5 calls to py_run_string() inside them.

Nov 11 '22 14:11 paulrougieux

2. There is a possible special case: the whole document consists of pure python code chunks. In that case, I guess it may be preferable to create a pure python script rather than using reticulate to run python code.

@yihui , what I most need is this item, it would be very helpful to have at least that working, and probably it would be the simplest for you to program, right?

Jan 18 '23 19:01 GitHunter0

@GitHunter0 Yes, this case should be relatively simple to implement.

Jan 18 '23 21:01 yihui

@GitHunter0 @yihui I have a made a specific issue to track this idea as a single item.

Edit: it was in fact already a feature request in https://github.com/yihui/knitr/issues/1928

Jan 18 '23 22:01 cderv

Convert a .Rmd notebook that contains both R and python chunks to an .R script with py_run_string for the python lines