gratia icon indicating copy to clipboard operation
gratia copied to clipboard

Evaluating parametric terms with factor by-variable

Open Excidion opened this issue 4 years ago • 3 comments

When calling evaluate_parametric_term on a a term of two factor variables, the information on wich values belongs to which combination seems to get lost.

Imagine a model with a formula that includes y~factorvar1:factorvar2. When calling

evaluate_parametric_term(model, "factorvar1:factorvar2")

the resulting table contains no information on which row belongs to which combination of levels of the factors. The column "term" only contains the entry "factorvar1:factorvar2" in every row.

Also the table has the same number of rows as my dataset, which is not the behavior of evaluate_parametric_term or evaluate_smooth that I am used to.

Am I missing something or have I been interpreting the purpose of these functions wrong? Or is this indeed a bug?

Excidion avatar May 12 '20 20:05 Excidion

There's a bug here somewhere (either by design or by problem with the code). I'll take a look. Thanks for letting me know.

gavinsimpson avatar May 13 '20 17:05 gavinsimpson

@Excidion I figured out what the issue was but fixing this means you won't be able to do what you wanted with the form of interaction. R treats f1:f2 as an order 2 term even if it is the only term (beyond the constant) in the model, and I'm using the order according to R to stop evaluate_parametric_term() with an error.

You could achieve what you want using interaction(f1, f2, drop = TRUE) to create a single factor from the interaction of f1 and f2 in the data before fitting the model. R would consider that term order 1.

I'd welcome some input on what you would like to have happen if evaluate_parametric-term() were to support interaction terms. The original behaviour was to plot the partial effect of a term (as plot.gam() would and as per termplot()) and it seems that handling interactions this way isn't something that makes sense.

I could literally return the contribution to the fitted model for the f1:f2 term only and in value return

  • the usual R generated labels you see for combinations of factor levels in interactions: f1Level1:f2Level2, or
  • just the concatenate the pair of levels, one per factor into a string: Level1-Level2.

The other option, which would be a departure from what evaluate_smooth does, would be to return what are often called the estimated marginal means; i.e. return predictions for all combinations of levels in the data. This would no longer be a partial effect for a single term; I'd need to include the main effects of the factors involved in the interaction. They'd also be conditional on some values for the other terms in the model as I have to provide something for the other terms in the model to be able to use predict(). I feel there are good existing ways to do this (emmeans), however.

Thoughts on what behaviour you were expecting?

gavinsimpson avatar May 14 '20 14:05 gavinsimpson

I am definitely more fond of returning the contributions to the fitted model. This is, at least for me, closer to some of the core aspects why i value GAM models - the intuitive interpretability of their results.

On how to handle the entries of the value column: The first bullet point seems very R-ish and more in line with how evaluate_smooth() handles this (eg. a term with s(var):fvar has s(var):f1level1 in the smooth column and an extra by_variable column).

Hope I picked up on your questions in the right way and my answers can be helpful.

Excidion avatar May 14 '20 16:05 Excidion