flipbookr icon indicating copy to clipboard operation
flipbookr copied to clipboard

Flipbookr shows different data step by step (pipe by pipe) when random data manipulations are involved

Open CLRafaelR opened this issue 3 years ago • 4 comments

I want to make a code flipbook which involves random data manipulations (e.g. random number generation using rnorm() and random data extraction using dplyr::slice_sample()). There, I'm trying to pass a data that is created using rnorm() to another processes. In one of these processes, I use dplyr::slice_sample() to randomly get sub-data from the data. I want to demonstrate the code passes the same (sub)data from one process to the subsequent process, as shown in the following gif (3 sec. latency per slide).

To get what I want explained above, I have to explicitly set set.seed() in every chunk in which I code such processes.

xaringan-with-setseed

However, without set.seed() in the chunk, flipbookr shows different data one step after another (or newly generates data and uses them one step after another), as the following gif shows (again, 3 sec. latency per slide).

xaringan-without-setseed

It bothers me that I have to set set.seed() in every chunk which involves random data manipulations. Is there any way to avoid set.seed() in every chunk?


In the MWE below, I want to demonstrate following seven steps:

  1. Show a data (which has a factor F, and a dependent variable DV, and a data id id)
  2. Split data into two sub-data according to the two levels of F
  3. Randomly extract data from each sub-data and concatenate them into one new data set
  4. Calculate the mean of all DV
  5. Group the data by F
  6. Calculate the mean of DV level-wise
  7. Ungroup data

These steps are written in chunk called without-set-seed and set-seed. Both chunks have exactly the same seven steps, but only chunk set-seed has set.seed(1212).

Although I also set set.seed(1212) in the first chunk in the Rmd called setup, in chunk without-set-seed, flipbookr unwantedly shows/uses different data step by step. On the other hand, in chunk with-set-seed, flipbookr consistently shows/uses the same data consistently one step after another.

MWE

---
output:
  xaringan::moon_reader:
    css: [default, hygge, ninjutsu, robot]
    nature:
      ratio: 16:9
---

```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  cache = FALSE
)
library(tidyverse)
library(flipbookr)
set.seed(1212)
```

```{r data-generation}
simdat <- tibble(
  id = 1:10, 
  F = rep(
    c("F1", "F2"),
    each = 5
  ),
  DV = c(
    rnorm(5, mean = 10, sd = 1),
    rnorm(5, mean = 20, sd = 3)
    )
)
```

---

`r chunk_reveal("without-set-seed", display_type = c("code", "output", "md"), md = c("Data", "Split data into two sub-data according to the two levels of F", "Randomly extract data from each sub-data and concatenate them into one new data set", "Calculate the mean of all DV (**But id and DV changed from the previous step!!**)", "Group the data by F (**id, DV, and gross.mean changed from the previous steps!!**)", "Calculate the mean of DV level-wise (**Again, id, DV, and gross.mean changed from the previous steps!!**)", "Ungroup data (**Again, id, DV, and gross.mean changed from the previous steps!!**)"))`


```{r without-set-seed, include = FALSE}
simdat |>
  group_split(F) |>
  map2_dfr(
    .y = c(2, 2),
    ~ slice_sample(.x, n = .y)
  ) |>
  mutate(
    gross.mean = mean(DV)
  ) |>
  group_by(F) |>
  mutate(
    lev.wise.mean = mean(DV)
  ) |>
  ungroup()
```

---

`r chunk_reveal("with-set-seed", display_type = c("code", "output", "md"), md = c("Set seed for random data extraction", "Data", "Split data into two sub-data according to the two levels of F", "Randomly extract data from each sub-data and concatenate them into one new data set", "Calculate the mean of all DV", "Group the data by F", "Calculate the mean of DV level-wise", "Ungroup data"))`

```{r with-set-seed, include = FALSE}
set.seed(1212)

simdat |>
  group_split(F) |>
  map2_dfr(
    .y = c(2, 2),
    ~ slice_sample(.x, n = .y)
  ) |>
  mutate(
    gross.mean = mean(DV)
  ) |>
  group_by(F) |>
  mutate(
    lev.wise.mean = mean(DV)
  ) |>
  ungroup()
```

CLRafaelR avatar Dec 08 '21 12:12 CLRafaelR

I just found now a comment on an answer in StackOverflow, which tells that set.seed(123) needs to be called each time before sample_n is performed.

CLRafaelR avatar Dec 08 '21 15:12 CLRafaelR

The proposed solution is interesting, that flipbookr::chunk_reveal() have an argument that sets the seed. At least this is what I'm understanding from your issue and maybe it could be implemented. Will think about it. Sorry in the delay responding.

EvaMaeRey avatar Aug 28 '22 19:08 EvaMaeRey

It looks like you are using flipbookr w/ pdf/beamer output?

EvaMaeRey avatar Aug 28 '22 19:08 EvaMaeRey

Thank you for your reply. It is great if flipbookr::chunk_reveal() has an argument for setting seed!

It looks like you are using flipbookr w/ pdf/beamer output?

I am afraid that I am not able to understand the question... but I wanted to produce not beamer but xaringan presentation out of Rmd files that were previously used to produce pdf/beamer output, by adding `r chunk_reveal(...)` to these Rmd files. Am I correctly responding to you?

CLRafaelR avatar Aug 28 '22 19:08 CLRafaelR