pixiedust
pixiedust copied to clipboard
Nice inline export for Rmd documents
I was thinking that it would be great if R had a set of functions for including regression output in body text and was wondering whether you thought this would be a good fit for pixiedust, or whether it would be out of scope for the package?
I imagine it working something like the following:
"Risk of cardiovascular events increased with increasing BMI
r sprinkle_text(x, y, form = "odds_ratio", conf.int = TRUE, p = TRUE)
...
where x
is a regression model and y
an independent variable.
The output would look like
"Risk of cardiovascular events increased with increasing BMI (OR: 4.5, 95% CI: 4.1-4.7)..."
What do you think?
I'm not opposed to the idea, but there are a few questions I would want resolve prior to taking on a project like this. For instance
- How should interaction terms be addressed?
- How should factor variables be retrieved?
- Most model objects return a
term
column when tidied viabroom
, but what should happen if there is noterm
column. - What should be the behavior if there is no confidence interval method for the object? When no p-value is available?
- Are there specific formats that should be followed? For example, APA formats? (Ugh)
- What should happen if an inappropriate
form
is requested. For example, if I request an odds ratio with a t.test object.
Questions 1-4 seem like they would need pretty firm answers before committing any serious code effort. 5 and 6 are things I think could become headaches in the future.
Just brainstorming out loud, but would be interested in your thoughts on 1-4.
All excellent points. Some thoughts below.
- I had wondered about this. I'll confess that I don't fully understand interactions in R. I think we would want some way to specify the stratum-specific effect (the linear combination).
- I think an optional argument specifying the factor level.
- I'm not familiar with any model objects that would not return a
term
, perhaps print a warning in place of the term? - I think CIs and p-values should be optional arguments, defaulting to FALSE. If users then specify an illegal choice an error message should be printed.
- Really good point. I think a combination of two options:
- some built in styles that can be specified by name
- another function by which a user can specify their own format that will be used universally through the document.
- Again, I think print a warning.
As I've thought about it more, I've decided that this really ought to be a generic for which additional methods may be written. For the generic, I propose the flolling functional requirement's
- Accepts an object that may be successfully
tidy
ied. - Returns the error message from
tidy
whentidy
is not successful - Returns any warnings generated by
tidy
- Accepts a
character (1)
argument that can determine the output format (overriding other formal arguments)
For now, I would set style = "none"
to indicate the formal arguments should be used to determine the format. Other styles, such as APA may follow later.
As an example of the lm
method, I would add the following requirements.
- Accepts a character vector naming the
term
to be summarised. A length one vector returns the main effect. A length two vector returns the interaction between two terms, etc. - Return an error if no term exists that satisfies the linear combination.
- Accepts a vector or list of characters, optionally named, specifying the
level
for any factors named interm
. If unnamed, the levels are assumed to follow the same order of factors interm
. - Returns an error if any levels in
level
cannot be found in its correspondingterm
. - Accept a
logical (1)
indicating if the confidence interval is to be included in the summary - Accept a
logical (1)
indicating if the SE is to be included in the summary - Accept a
logical (1)
indicating if the test statistic is to be included in the summary - Accept a
logical (1)
indicating if the p-value is to be included in the summary - Accept a
character(1)
designating the text label for the coefficient (beta, OR, etc) - Accept a function by to apply to the coefficient and CI
- Accept additional arguments to the transformation function
How would this work for getting started?
If you are using LaTeX just use knitR. Here is my chi-sq reported values from the lm objects:
\({\chi}^2(\Sexpr{PreviousChiSq$df})=\Sexpr{round(PreviousChiSq$dx,2)}, p=\Sexpr{round(PreviousChiSq$chi,24)}\).
It doesn't give you the label for the value, but it's there and easy.
Edit: For percents look at something like this first: http://stackoverflow.com/questions/7145826/how-to-format-a-number-as-percentage-in-r
Here's a first attempt. How does this look as proof of concept?
use devtools::install_github("nutterb/pixiedust", ref = "new-latex-tables-inline-dust")
to install the package with these utilities.
The source code to generate the document displayed below is at https://gist.githubusercontent.com/nutterb/bcc3c04bc4c807cb9753f74820584cf5/raw/dfe78db875de0a314d4e87126ab2cdf5548173d8/dust_inline_example.Rmd
I think that works really nicely.
One thing I noticed is that the upper confidence interval does not appear to be formateted to two dp.
I don't know if I was doing something wrong, or whether it was something else, I just had to update R to 3.3.2 and reinstall all my libraries. When trying to install pixiedust as above, I also had to install the packages below, one-by-one.
Formula acepack latticeExtra gridExtra htmlTable data.table
This sounds like something in the dependency chain. A dependency in one of the dependencies is not being installed. When upgrading R, I would recommend using dependencies =TRUE
when using install.packages
or any of its devtools
variants. You can piece together why by reading about the dependencies
argument in install.packages
and install_github
.