formulaic icon indicating copy to clipboard operation
formulaic copied to clipboard

Draft: Add support for storing modelling metadata from stateful transforms, such as `offset`.

Open matthewwardrop opened this issue 3 years ago • 0 comments

This (draft) PR adds support for outputting of modelling metadata from "stateful transforms", which can be used to store (for example) modelling instructions from the formula itself (such as offsets to be used during modelling). These are distinct from ModelSpec in that they live alongside (and augment) the model matrix rather than being persisted as a description of the materialisation process. Example:

import pandas
import numpy as np
n = 1000
df = pandas.DataFrame(
    dict(
        y=np.random.normal(size=n),
        x1=np.random.normal(size=n),
        x2=np.random.normal(size=n),
        m=np.random.choice(list("abc"), size=n),
        n=np.random.choice(list("abc"), size=n),
        e=np.random.normal(size=n),
    )
)

from formulaic import model_matrix
from formulaic.utils.stateful_transforms import stateful_transform

@stateful_transform
def offset(x, _state=None, _metadata=None):
    _metadata['offset'] = _metadata.get('offset', 0) + x
    return {}

mm = model_matrix("x1 + offset(x2)", df)
mm
>      Intercept        x1
> 0          1.0  0.402157
> 1          1.0  0.159565
> 2          1.0 -0.597806
> 3          1.0 -0.469659
> 4          1.0 -0.090165
> ..         ...       ...
> 995        1.0 -2.376424
> 996        1.0  0.903574
> 997        1.0  0.266223
> 998        1.0 -0.418331
> 999        1.0 -1.261332
> 
> [1000 rows x 2 columns]

mm.metadata['offset']
0     -0.607345
1     -0.088413
2     -0.599182
3     -1.730541
4     -0.110609
         ...   
995    0.524394
996    0.063271
997    1.693313
998   -0.534796
999   -0.892107
Name: x2, Length: 1000, dtype: float64

matthewwardrop avatar Jul 11 '21 22:07 matthewwardrop