Merging projects
Hello @dgromer and @crsh,
today I stumpled over your R packages and really liked your ideas. I am the maintainer of pubprint, another R package that supports publication ready output of several tests to different output formats.
I'm a bit unhappy with the code base of my current package and user interaction with it. So I'm thinking about restructuring the code (even right now I lack time to do it), but I'm not sure what would be the best approach. Pubprint has a somewhat wider perspective and supports different styling guidelines (though only apa is partly implemented).
So what do you think about combining our efforts? I think it would need a lot of discussion about the basic structure but I would be pleased to see this happen.
If you want to check my source code, go back to the ver-0.2.1 tag, I thinking about reverting the last changes since that. Currently it is hosted at bitbucket.org.
Maybe it would be even possible to come to an common ground with @crsh. A combined package could be a backend for the papaja package.
Best mutlusun
Hi @mutlusun, hi @dgromer,
I have thought a little bit about generalizing papaja's approach to different style guides by basing all methods on a set of global options (e.g., should leading zeros of p-values be omitted, the number of significant digits to report, etc.). Besides time constraints I haven't, yet, pursued this direction because different fields differ not only in reporting styles but also in which statistics are customary and expected. This would make it difficult to provide a common set of methods.
I'm generally open to combining our efforts, though. I'll try to review you packages as soon as I can and give you my view on things. If you would like to do the same I think this would be a good base to structure the conversation and move forward.
Best regards, Frederik
Hello,
thanks for your reply. Here is a short overview for looking at pubprint:
- The user calls only
pprint()that is determining the type of argument and calls a style function accordingly. It supports several arguments like setting in math mode or not, whether so seperate from sourrounding text, etc. - All calls to style (different publication styles) or output (different output formats) functions are done through placeholder that call the functions of the desired style or output format (so it is easily extendable with more functions or new styles and output formats).
- In theory the user can extend the functionality of the package by placing style or output functions in user context.
There are several problems:
- User interaction is a bit inflexible. There are several use cases: If I want to report
N=23it would take mepprint(23, separator=NULL, name="N", nsmall=0). For a t-Test onlypprint(t.test(1:10))is appropriate. So there are shortcuts necessary without giving all the arguments in theNcase. I tried to circumvent this by templates, but they are only partly implemented. - Expandability by user is not really possible. There are no ways to access the internal output functions, etc.
- The style functions are not exported to user context. But there is a documentation for it. That is to document the arguments they take (like
name="N", nsmall=0above). I think that's a bit strange and should be solved in a smarter way.
Best regards mutlusun
@mutlusun @crsh
Sounds like a good idea, because right now, similar functionality (formatting in APA style) is scattered around in multiple packages. When it comes to supporting different style guides however, I'm not really a fan of putting to much functionality into a single R package, as it gets much harder to maintain such a package. I'd be open to discuss a combined approach and to take a look at whether it's possible to bring the different approaches together.
Hello,
so I would propose some design ideas for commenting:
- I think it's more handy to have one function that's always called for generating an output. It should be the task of this function to determine which test is given. Example:
function(t.test(1:10)); function(aov(x ~ y, data=df)). There should be an argument to specify desired output manually (so it would be possible to have different functions for an ANOVA for example). - We need some sort of abstraction layer for the output format (HTML, LaTeX, etc.). It could be even reasonable to maintain an extra package for this task. So the main package would be smaller but I thinks this depends on how specialized our output layer would be.
- The package should be extendable through the user. This means the users can add own style functions to format certain tests the way they like it. It would be necessary that the user can interact with the functions for the output format.
And we should find a name and a repository for it. I think apa isn't that worse :-). Even if I prefer hg over git it would be okay for me to host it on github. I think it's better to discuss the more in depth details on base of merge requests.
Regards mutlusun
@mutlusun
I would like to keep the apa repository and corresponding package on CRAN as it is for now (and implement some stuff that I need). But we could open up a new development repository and discuss ideas there.
During a first quick browse of apa a couple of things came to mind with regards to joining forces:
-
To me it's key that the package plays well with other packages. I think it's important to implement all formatting functions as S3/S4-methods rather than specialized functions (e.g.
t_apa(),cor_apa()etc.). On the other hand, I don't want to create hacks in order to parse classless objects from analysis functions (e.g., the bare list output fromezANOVA()). My approach here has been to create pull requests adding the needed changes. -
I, personally, see limited merit in formatting output for all R Markdown formats, such as Markdown or HTML.
papajais meant for writing reports, manuscripts, dissertations, etc. I'm currently not sure I want to invest the extra effort to support formats besides LaTeX and Word. But maybe I'm overestimating the amount of work that's necessary. -
I think it would be best if the package would do as few calculations as possible but rather require the user to use other packages that calculate, e.g., effect sizes and confidence intervals. That way the user can decide what effect size is most appropriate and it reduces the maintenance load.
papajadoes some calculations but I'm looking to eliminate those cases. -
papaja::apa_print()provides different sets of formatted output depending on what information the user wants to report (i.e., descriptive estimate, test statistics, or both) and for complex analyses it provides readily formatted tables (e.g., ANOVA). I find this very convenient and would like to see something like this in a joint effort. -
It's important to me that all functions that are accessible to the user perform thorough input validation.
Here is a very basic example how the structure could be (only supports printing of a number; it is based on parts of pubprint):
apa <- function(...,
type,
mmode=TRUE,
out.format="latex")
{
ret <- do.call(type, list(..., out.format=out.format))
if (mmode)
ret <- out.math(ret, out.format=out.format)
ret
}
mynumeric <- function(x,
name,
nsmall=2L,
operator="=",
out.format)
{
num <- out.number(round(x, nsmall), out.format)
if (!missing(name))
num <- paste0(name,
out.operator(operator, out.format=out.format),
num)
return(num)
}
### out.number
out.number <- function(x, out.format)
{
do.call(paste0("out.number.", out.format), list(x))
}
out.number.latex <- function(x)
{
x # some processing could be here
}
out.number.html <- function(x)
{
x # some processing could be here
}
### out.operator
out.operator <- function(x, out.format)
{
do.call(paste0("out.operator.", out.format), list(x))
}
out.operator.latex <- function(x)
{
ifelse(x == "<", "\\leq", x)
}
out.operator.html <- function(x)
{
ifelse(x == "<", "<", x)
}
### out.math
out.math <- function(x, out.format)
{
do.call(paste0("out.math.", out.format), list(x))
}
out.math.latex <- function(x)
{
paste0("\\ensuremath{", x, "}")
}
out.math.html <- function(x)
{
paste0("<math xmlns=\"&mathml;\">", x, "</math>")
}
A use case:
> apa(c(5,10,15.1243345), type="mynumeric")
[1] "\\ensuremath{5}" "\\ensuremath{10}" "\\ensuremath{15.12}"
> apa(c(5,10,15.1243345), type="mynumeric", nsmall=3)
[1] "\\ensuremath{5}" "\\ensuremath{10}" "\\ensuremath{15.124}"
> apa(c(5,10,15.1243345), type="mynumeric", nsmall=3, out.format="html")
[1] "<math xmlns=\"&mathml;\">5</math>"
[2] "<math xmlns=\"&mathml;\">10</math>"
[3] "<math xmlns=\"&mathml;\">15.124</math>"
It supports the following points:
- Only one function need to be called by the user. The desired output can be varied by
typeargument (this makes it possible to have different outputs for the same test). Selecting a default type for a given test is implementable. - Several output formats are supportable. I would suggest to implement this feature as it is not difficult to do.
I see following problems with a approach like this:
- Calling the correct output format (HTML vs. LaTeX, etc) is a bit hacky. I would propose an S3/S4 object as an argument to the
apafunction that is used to call the corresponding function. - This is a very simple example. In more elaborate cases there could be several arguments to a) the
apafunction (e.g. math mode yes/no? surrounding parentheses yes/no?) b) to thetypefunction (e.g. a t.test: should the means of the two groups are printed as well or not?; Add an additional effect size; etc.) c) to the output format function (e.g. latex: should the number printed with support of thesiunitxpackage?; how many digits should be printed after the dot?). This can lead to a mess of which argument should be passed to which function (and how they can be named). How could this be solved on an elegant way? - I think it's necessary to support different sets of defaults (I may print a number in the text or in a table with different formattings). How could such a set of defaults/templates supported? This interacts with the point above.
- The actual
typefunctions andoutfunctions should not called by the user itself. So there is no need to export them to user environment. But in a lot of cases it may necessary to change our shipped functions to adapt them for individual purposes. Also adding new type function by a user may occur to support an another test. So there is a need of changing/adding functions and this is most easily done in the user environment. How to solve this dilemma?
What do you think about that?
I would propose to stick at first to this little example to solve the problems and to break down the more abstract discussion points.
Calling the correct output format (HTML vs. LaTeX, etc) is a bit hacky
How so? While rendering an R Markdown document you can access the knitr options, which contain the target output format. The contents of this option could be the default (maybe with a plain text fallback for printing in the console). But maybe I misunderstand.
This is a very simple example
This indeed worries me, too. I think that a lot of these issues could be solved by setting global options. These global options could be used as defaults in the function call. The options could be set when choosing the reporting style and customized by the user. This is how I'm currently handling some common options in papaja and other packages. But there could be better ways.
Concerning your last point, I'm not entirely sure what you mean. Could you elaborate? Also, I don't quite understand the purpose of the type argument in your concept? What would be other types?
Calling the correct output format (HTML vs. LaTeX, etc) is a bit hacky
How so? While rendering an R Markdown document you can access the knitr options, which contain the target output format. The contents of this option could be the default (maybe with a plain text fallback for printing in the console). But maybe I misunderstand.
You are right. This won't be a problem. But I was more talking about the way the apa() function determines the appropriate output format. There is always an abstraction layer that chooses correct output format and calls an appropriate function respectively. I think it would be nicer to have an S3/S4/S5 class that is doing this work. Sorry, I think my explanation was a bit weird. ;)
This indeed worries me, too. I think that a lot of these issues could be solved by setting global options. These global options could be used as defaults in the function call. The options could be set when choosing the reporting style and customized by the user. This is how I'm currently handling some common options in papaja and other packages. But there could be better ways.
Concerning your last point, I'm not entirely sure what you mean. Could you elaborate? Also, I don't quite understand the purpose of the type argument in your concept? What would be other types?
Other types may other tests. In the example is a type function for data type numeric. We could add type functions for t.test, anova, aov, lm, etc. They are doing the real work. Why adding an extra type argument to apa() and not determine the correct type function automatically? We could add multiple type functions for one test. For example a regression: one type function could print the model test, another could print single coefficients and another one the equation for this regression. So correct function can be chosen.
This type (and output) functions doesn't need to be called directly, because they are called from our apa() function. So there is no need to export them to user space. But there may be people who want to adapt the type functions to their needs or even add completely new ones (e.g. adding support for new tests). Not exporting the output functions would mean a user cannot use them for creating his own type functions. I think both ways are not very pleasant: Exporting all type and output functions would clutter the user space, don't giving any access would prevent easy adding of new type functions. Is this point more understandable now? Sorry, the explanation may be a bit weird again but it's hard to explain myself.
This indeed worries me, too. I think that a lot of these issues could be solved by setting global options. These global options could be used as defaults in the function call. The options could be set when choosing the reporting style and customized by the user. This is how I'm currently handling some common options in papaja and other packages. But there could be better ways.
Global options would be very cool. But in a more complex script it could be a problem to choose an appropriate argument name for all the arguments (this tree depicts function calls in the example case):
- apa (arg1, arg2, arg3, ...) # main function
- mynumeric (arg1, arg2, arg3, ...) # type function
- out.math (arg1, arg2, arg3, ...) # output function
Aside from the apa() main function, we may have a lot of type functions (probably one for every test, maybe more) and a lot of output functions (with differenct tasks). For example: apa() function passes arguments to mynumeric() (through ellipsis) and mynumeric() may passes arguments to out.math(). So we would need to take care that never a argument name is already used at a top level. Maybe this could be better solved with S3/S4/S5 classes again?
Sorry, I think it's a bit complicated. Maybe it's easier to solve this issues in an IRC/Jabber chat if you are interested in?