bambi icon indicating copy to clipboard operation
bambi copied to clipboard

[WIP] Fix HSGP predictions

Open tomicapretto opened this issue 1 year ago • 1 comments

Fixes predictions when HSGP contains a by variable.

  • It makes sure the underlying design matrix is not modified until the data of all HSGP components are used.
  • Modifies get_model_covariates so it also looks at the named arguments of function calls.
    • For this, this PR in formulae was also needed https://github.com/bambinos/formulae/pull/107

TODO: implement tests?

tomicapretto avatar Feb 18 '24 19:02 tomicapretto

Thanks a lot @tomicapretto 👍🏼

GStechschulte avatar Feb 19 '24 11:02 GStechschulte

@GStechschulte could you try this?

import bambi as bmb
import numpy as np
import pandas as pd

df = pd.read_csv("tests/data/gam_data.csv")

rng = np.random.default_rng(1234)
df["fac2"] = rng.choice(["a", "b", "c"], size=df.shape[0])

formula = "y ~ 1 + x0 + hsgp(x1, by=fac, m=10, c=2) + hsgp(x1, by=fac2, m=10, c=2)"
model = bmb.Model(formula, df, categorical=["fac"])
idata = model.fit(tune=500, draws=500, target_accept=0.9)

Plot 1

bmb.interpret.plot_predictions(
    model, 
    idata, 
    conditional="x1", 
    subplot_kwargs={"main": "x1", "group": "fac2", "panel": "fac2"},
);

image

Plot 2

bmb.interpret.plot_predictions(
    model, 
    idata, 
    conditional={
        "x1": np.linspace(0, 1, num=100),
        "fac2": ["a", "b", "c"]
    }, 
    legend=False,
    subplot_kwargs={"main": "x1", "group": "fac2", "panel": "fac2"},
);

image

I was expecting to get the second plot with the code for the first plot. I think we got the result we got because we first generate the data, and only then, we use the subplot_kwargs? At that point, it's just too late, you only have one value of fac2

tomicapretto avatar Feb 23 '24 16:02 tomicapretto

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 90.16%. Comparing base (b5b9f09) to head (bdb48d8). Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #780      +/-   ##
==========================================
+ Coverage   89.86%   90.16%   +0.29%     
==========================================
  Files          46       46              
  Lines        3810     3814       +4     
==========================================
+ Hits         3424     3439      +15     
+ Misses        386      375      -11     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 23 '24 17:02 codecov-commenter

@tomicapretto thanks! Plot 1 is displaying correctly. It is because you are not explicitly passing fac2 to conditional. Which results in, as you stated, a single default value computed for fac2. The single value cannot have any subplots.

This is the behavior both interpret and marginaleffects uses if a covariate was specified in the model, but not passed to conditional.

bmb.interpret.plot_predictions(
    model, 
    idata, 
    conditional=["x1", "fac2"], 
    subplot_kwargs={"main": "x1", "group": "fac2", "panel": "fac2"},
    legend=False
);

image

GStechschulte avatar Feb 23 '24 17:02 GStechschulte

Thanks @GStechschulte! I think this is done.

I know the test is actually testing many things at the same time, not just the fix. But I think it's not possible to write a test for the fix in particular, and if possible, it would be so complicated.

tomicapretto avatar Feb 23 '24 18:02 tomicapretto

Thanks @GStechschulte! I think this is done.

I know the test is actually testing many things at the same time, not just the fix. But I think it's not possible to write a test for the fix in particular, and if possible, it would be so complicated.

Yup, I agree. It is also nice to have the text for interpret in there.

GStechschulte avatar Feb 29 '24 15:02 GStechschulte