gratia
gratia copied to clipboard
Evaluating parametric terms with factor by-variable
When calling evaluate_parametric_term
on a a term of two factor variables, the information on wich values belongs to which combination seems to get lost.
Imagine a model
with a formula that includes y~factorvar1:factorvar2
. When calling
evaluate_parametric_term(model, "factorvar1:factorvar2")
the resulting table contains no information on which row belongs to which combination of levels of the factors. The column "term"
only contains the entry "factorvar1:factorvar2"
in every row.
Also the table has the same number of rows as my dataset, which is not the behavior of evaluate_parametric_term
or evaluate_smooth
that I am used to.
Am I missing something or have I been interpreting the purpose of these functions wrong? Or is this indeed a bug?
There's a bug here somewhere (either by design or by problem with the code). I'll take a look. Thanks for letting me know.
@Excidion I figured out what the issue was but fixing this means you won't be able to do what you wanted with the form of interaction. R treats f1:f2
as an order 2 term even if it is the only term (beyond the constant) in the model, and I'm using the order according to R to stop evaluate_parametric_term()
with an error.
You could achieve what you want using interaction(f1, f2, drop = TRUE)
to create a single factor from the interaction of f1
and f2
in the data before fitting the model. R would consider that term order 1.
I'd welcome some input on what you would like to have happen if evaluate_parametric-term()
were to support interaction terms. The original behaviour was to plot the partial effect of a term (as plot.gam()
would and as per termplot()
) and it seems that handling interactions this way isn't something that makes sense.
I could literally return the contribution to the fitted model for the f1:f2
term only and in value
return
- the usual R generated labels you see for combinations of factor levels in interactions:
f1Level1:f2Level2
, or - just the concatenate the pair of levels, one per factor into a string:
Level1-Level2
.
The other option, which would be a departure from what evaluate_smooth
does, would be to return what are often called the estimated marginal means; i.e. return predictions for all combinations of levels in the data. This would no longer be a partial effect for a single term; I'd need to include the main effects of the factors involved in the interaction. They'd also be conditional on some values for the other terms in the model as I have to provide something for the other terms in the model to be able to use predict()
. I feel there are good existing ways to do this (emmeans), however.
Thoughts on what behaviour you were expecting?
I am definitely more fond of returning the contributions to the fitted model. This is, at least for me, closer to some of the core aspects why i value GAM models - the intuitive interpretability of their results.
On how to handle the entries of the value
column: The first bullet point seems very R-ish and more in line with how evaluate_smooth()
handles this (eg. a term with s(var):fvar
has s(var):f1level1
in the smooth
column and an extra by_variable
column).
Hope I picked up on your questions in the right way and my answers can be helpful.