- [ ]
park_extra_magic_morning = c(rep(1, 5000), rep(0, 5000)) -> rep(1:0, each = 5000) is shorter and clearer. rep(0:1, length.out = 10000 is perhaps safer (but the order of the 0s and 1s will be different.
- [ ] I would re-order the workflow in 13.3. 1) Refine the question; 2) Wrangle the data (since it comes first in the workflow, and so you can use the wrangled data to test subsequent steps along the way, not just at the very end); 3) simulate population for left-most variables; 4) simulate process (perhaps with a better/less vague name); 5) compute stats.
- [ ] Should
pivot_longer( names_to = "term", values_to = "estimate", cols = everything() go inside compute_stats()?
- [ ] In
fit_models() from 13.3, fit_wait_minutes_posted is never needed or used since we set the values of that variable based on our contrast.
- [ ] It would perhaps be good to add a bootstrap confidence interval at the end of 13.3.
- [ ] The modular functions that return lists of models; data + contrast; etc. seem a little heavy and unnatural. Seems like it might be nicer if the functions returned more natural things (like a simple data frame).
sim_population() could produce a data frame from simulation parameters, and calculate_stats() could produce a tidy model data frame from a data set. In any case, I would remove the pluck() from compute_stats().
- [ ] in
compute_stats(), exposure_val and control_val are never used and values of 30 and 60 are hard coded into the names of the returned object. I'd suggest something like the code below. (Alternatively, one could use lm() |> tidy().)
# sim_obj is a list created by our simulate_process() function
compute_stats <- function(sim_obj) {
sim_obj |> pluck("df_outcome") |> # pluck() can be avoided if the input is a data frame
group_by(wait_minutes_posted_avg) |>
summarize(avg_wait_actual = mean(wait_minutes_actual_avg)) |>
pivot_wider(
names_from = wait_minutes_posted_avg,
values_from = avg_wait_actual,
names_prefix = "X_"
) |>
mutate(effect = diff(c_across(1:2)))
}