tidystats
tidystats copied to clipboard
Support for marginaleffects functions
I'll start adding support for the marginaleffects R package. I think there are only a couple of relevant functions (marginaleffects
, marginalmeans
, predictions
), but I'll look into it more and update this issue when I got more familiar with the package.
This is now essentially done. In the end, I added only the summary()
results of the main functions: marginaleffects
, marginalmeans
, and comparisons
. These by themselves return large data frames, which I think are not really apt/typical for reporting (but anyway they can now be easily saved via the new data.frame
method).
There is one more relevant function, predictions
, but I think their summary.predictions
function has a bug at the moment; it doesn't reproduce the outcome I see in the vignettes but returns only a single cell. I wrote the method for it (it's simple), but I'd just skip including it for now, it's disabled. (Or you can perhaps check it, in case the issue might be just my OS/software version.)
I've been looking over the code for this one and it raises an interesting question regarding prediction functions. Running these functions tends to result in a pretty large data frame with many many numbers (i.e., the predicted values + associated statistics). This is rarely what is reported. Instead, it is usually the result of the summary()
function that is reported, which is also why this is what we parse and tidy up.
However, if part of the goal of tidystats is to store statistics, there's an argument that we should support tidying both the predictions and the summary statistics of the predictions.
I guess I'm currently inclined to only focus on summary statistics. Do you have any thoughts about this?
Thinking about it some more, I think that both actually does make sense. It doesn't really fit tidystats to parse large data frames (because that looks terrible in the add-in), but it's not always that big. You can use these prediction functions to also predict on small data frames, which would make sense to support.
Yes. I would only add that I'm always trying to be line with what users see via the related print
function (so basically whenever they enter the variable in the console, which I think is the obvious way to check any results/output, at least for me). So in case of emmeans
, for instance, it's the summary
(as I mentioned there), hence it makes sense that we report only its summary. However, in case of marginaleffects
, users see the plain data frame when print
ing, hence they might be quite surprised if tidystats
reports only the summary. I (again) think it's better to leave to the user to specify summary (via calling summary()
) if that's what they want.
But anyway, either way is fine by me. (From the technical perspective, it takes nothing to implement passing to summary
.)
Yeah that makes sense to me. Although that's not how I have done it with lm()
. There I actually run summary under the hood (much like how the tidy()
function from broom
works).