formulaic
formulaic copied to clipboard
Draft: Add support for storing modelling metadata from stateful transforms, such as `offset`.
This (draft) PR adds support for outputting of modelling metadata from "stateful transforms", which can be used to store (for example) modelling instructions from the formula itself (such as offsets to be used during modelling). These are distinct from ModelSpec
in that they live alongside (and augment) the model matrix rather than being persisted as a description of the materialisation process. Example:
import pandas
import numpy as np
n = 1000
df = pandas.DataFrame(
dict(
y=np.random.normal(size=n),
x1=np.random.normal(size=n),
x2=np.random.normal(size=n),
m=np.random.choice(list("abc"), size=n),
n=np.random.choice(list("abc"), size=n),
e=np.random.normal(size=n),
)
)
from formulaic import model_matrix
from formulaic.utils.stateful_transforms import stateful_transform
@stateful_transform
def offset(x, _state=None, _metadata=None):
_metadata['offset'] = _metadata.get('offset', 0) + x
return {}
mm = model_matrix("x1 + offset(x2)", df)
mm
> Intercept x1
> 0 1.0 0.402157
> 1 1.0 0.159565
> 2 1.0 -0.597806
> 3 1.0 -0.469659
> 4 1.0 -0.090165
> .. ... ...
> 995 1.0 -2.376424
> 996 1.0 0.903574
> 997 1.0 0.266223
> 998 1.0 -0.418331
> 999 1.0 -1.261332
>
> [1000 rows x 2 columns]
mm.metadata['offset']
0 -0.607345
1 -0.088413
2 -0.599182
3 -1.730541
4 -0.110609
...
995 0.524394
996 0.063271
997 1.693313
998 -0.534796
999 -0.892107
Name: x2, Length: 1000, dtype: float64