causal-inference-in-R
causal-inference-in-R copied to clipboard
Stylistic and code consistency rules
This is a sort of internal style guide for various decisions we need to make. All of these are open to discussion, but I want to be consistent for all of them. This issue also serves as a final check when the first edition is done.
Code
- [ ] Use
grkstyle::grk_style_dir()to style code - [ ] Use
tibble()instead ofdata.frame() - [ ] Use
summarize()instead ofsummarise() - [ ] Use
if_else()instead ofifelse() - [ ] When grouping by a single variable, use
group_by()over.by - [ ] When grouping by more than one variable when you don't need persistent grouping, use
.byorgroup_by()with.groups = "drop"instead ofungroup(). Prefer.byunlessgroup_by()is more readable for the code, e.g., a very long or complexsummarize()statement - [ ]
augment(data = ...)vs.augment(newdata = ...): For cases when we are supplying the original data frame (which we need to do to keep the variables that weren't in the model), use thedataargument ofaugment()instead ofnewdata.newdatashould only be used when it's actually a new data frame, as we do in g-computation cloning. - [ ] Use
|>instead of%>% - [ ] Use
slice_sample()instead ofsample_n(),sample_frac()(because it supercedes these per the documentation) - [ ] For comments, use all lowercase (except proper nouns) and one
#at beginning - [ ] Use explicit testing of
0and1, e.g. ifx = 1, writeif (x == 1)rather thanif (x) - [ ] For
case_when(), use.default = default_valuerather thanTRUE ~ default_value
Quarto
- [ ] Prefer cross-references over writing that we discussed/will discuss something somewhere else in the book
- [ ] For code we are not showing, prefer
code-fold: trueoverecho: false. This way, we can de-emphasize the code while providing easy access for those interested - [ ] Use sentence breaks (a line break after every sentence) for easier git usage. You can do this automatically by changing your settings in RStudio to use sentence breaks and turning visual mode on and off.
- [ ] For package names, write it as
`{pkg}`(which will automatically link to the package website) the first time you mention it in a chapter; thereafter, don't use formatting. Just write: pkg.
Writing
- [ ] Check that anytime we type "casual" we mean it. Common typo!
- [ ] Defined terms are in bold; italics used for emphasis, not definitions.
- [ ] Prefer "exposure" and "outcome" as generic terms for the cause and effect over e.g., treated vs untreated, but use your judgment
- [ ] Write it as "Extra Magic Hours" and "Extra Magic Morning/Evening" (all caps)
- [ ] Write times as 9 AM instead of 9am
- [ ] "data frame" and "data set" (with a space between)
Figures and tables
- [ ] All figures and tables have a thorough caption (what is actually in the fig/tbl, not just the idea behind it)
- [ ] Use
gt()for code-based tables instead ofkable(). Otherwise, use markdown tables. - [ ] No time labels for EMM DAG since we don't actually know. Putting this here because we reuse this code in several places so we should check for consistency
I think we may have some other things to add for this one to make sure these are consistent
One thing from #310 that might be good: prefer ifelse()/if_else() over case_when() when just a single condition
The big one in this PR, though, is ifelse() vs. if_else(). I hold this opinion lightly, but I have a preference for ifelse() because I don't believe if_else() is that helpful in practice. As opposed to say, map_dbl(), it doesn't really feel like I'm declaring the type. I have to know the type of the inputs. And I also still need to know R's conversion rules to understand the violation. But if_else() is sort of a lonely function in this respect because lots of functions can do implicit conversion besides ifelse(), so I'm usually already on the defensive. So if_else() is a function that has never won me over. But again, I can be convinced and am more concerned about consistency
Suggestion to use if_else instead of ifelse. Reasons:
if_elseensures type consistencyif_elsehandlesNAbetterif_elseis faster To my knowledge, reasons to useifelsewould include shortcuttingTRUE/FALSEarguments with integers (which I don't think is the best pedagogy) and a desire to avoid evaluating the false clause when the statement is true (also not the best pedagogy / has a code smell).ifelsecould also be good if aiming for minimal dependency applications, but we've already bought intolibrary(tidyverse)as a first move in this book. Happy to learn if there are better reasons though.
Some good discussion here: https://stackoverflow.com/questions/50646133/dplyr-if-else-vs-base-r-ifelse
shortcutting
TRUE/FALSEarguments with integers (which I don't think is the best pedagogy)
I agree with this and we should add explicit testing to our list (as you changed in #310)
Edit: added to the checklist
@LucyMcGowan can be the tie-breaker for ifelse() vs. if_else(). I'll be ok with either
One thing from #310 that might be good: prefer
ifelse()/if_else()overcase_when()when just a single conditionThe big one in this PR, though, is
ifelse()vs.if_else(). I hold this opinion lightly, but I have a preference forifelse()because I don't believeif_else()is that helpful in practice. As opposed to say,map_dbl(), it doesn't really feel like I'm declaring the type. I have to know the type of the inputs. And I also still need to know R's conversion rules to understand the violation. Butif_else()is sort of a lonely function in this respect because lots of functions can do implicit conversion besidesifelse(), so I'm usually already on the defensive. Soif_else()is a function that has never won me over. But again, I can be convinced and am more concerned about consistency
Afaik it's not relevant to applications we're using in this book, but ifelse can have some pretty severe gotchas that I've experienced due to hidden type casting. That SO thread I linked has a good example using dates, and there are others.
Reading from the beginning, this is the first time I've encountered ifelse in the book and I would surely notice it in the 6 other files where it occurs (did a quick search to make sure I'm not proposing a drastic PITA)
@LucyMcGowan can be the tie-breaker for
ifelse()vs.if_else(). I'll be ok with either
Great plan!
New topic: suggest replacing TRUE ~ lines in case_when with the much clearer .default =
ifelse() might be the easiest find-replace ever luckily. Identical syntax and no other use of the letters "ifelse"
@LucyMcGowan voted for if_else(), adding it to the checklist now and will submit a PR
Note: https://github.com/ropenscilabs/aeolus will auto wrap markdown lines