Set reference level in CPH using formula with Formulaic?
Hi, thanks for a useful package.
Related to #458 and somewhat related to the discussion in #1203, is it possible to set a reference level for a categorical variable using formulas (v0.25.0)? In statsmodels, which uses Patsy, we could set reference level for predictor x to "ref" in the following model
from statsmodels.formula.api import ols
from patsy.contrasts import Treatment
mod = ols("y ~ C(x, Treatment(reference='ref'))", data)
as described here. However, I see that v0.25.8 switched from Patsy to Formulaic.
Is it possible to set a reference level similarly in Formulaic? Or are there other solutions that still use lifelines.CoxPHFitter().fit(..., formula=<myform>) in lifelines>=0.25.8?
Hi @t-silvers, AFAIK formulaic doesn't support that feature yet. If you really need that feature, you can use Patsy to transform your df outside lifelines, and then pass the transformed df into lifelines.
thanks for the speedy reply! Too bad that Formulaic doesn't have that option, but sure--I could use patsy to transform my df outside lifelines.
for completeness, this is the code I'll use until a better solution is available:
import pandas as pd
import patsy
from patsy.contrasts import Treatment
from lifelines import CoxPHFitter
form = "other_pred + C(x, Treatment(reference='ref'))"
# for some data, data: pd.DataFrame(), with survival data cols ['OS.time','OS']
design = pd.concat([
patsy.dmatrix(formula_like=form, data=data, return_type="dataframe"),
data[['OS.time','OS']]
], axis=1)
form = None
cph = CoxPHFitter()
cph.fit(
design,
duration_col='OS.time',
event_col='OS',
formula=form
)
Hi @t-silvers , as of April 2022 Formulaic does support this.
The syntax is nearly identical but follows R conventions:
from formulaic import Formula
f = Formula("y ~ C(x, contr.treatment('ref'))")
Documentation is still lacking, but will be tidied up before v1.0.0 later this year.