lifelines icon indicating copy to clipboard operation
lifelines copied to clipboard

Set reference level in CPH using formula with Formulaic?

Open t-silvers opened this issue 4 years ago • 2 comments

Hi, thanks for a useful package.

Related to #458 and somewhat related to the discussion in #1203, is it possible to set a reference level for a categorical variable using formulas (v0.25.0)? In statsmodels, which uses Patsy, we could set reference level for predictor x to "ref" in the following model

from statsmodels.formula.api import ols 
from patsy.contrasts import Treatment

mod = ols("y ~ C(x, Treatment(reference='ref'))", data) 

as described here. However, I see that v0.25.8 switched from Patsy to Formulaic.

Is it possible to set a reference level similarly in Formulaic? Or are there other solutions that still use lifelines.CoxPHFitter().fit(..., formula=<myform>) in lifelines>=0.25.8?

t-silvers avatar Dec 02 '21 19:12 t-silvers

Hi @t-silvers, AFAIK formulaic doesn't support that feature yet. If you really need that feature, you can use Patsy to transform your df outside lifelines, and then pass the transformed df into lifelines.

CamDavidsonPilon avatar Dec 02 '21 23:12 CamDavidsonPilon

thanks for the speedy reply! Too bad that Formulaic doesn't have that option, but sure--I could use patsy to transform my df outside lifelines.

for completeness, this is the code I'll use until a better solution is available:

import pandas as pd
import patsy
from patsy.contrasts import Treatment
from lifelines import CoxPHFitter

form = "other_pred + C(x, Treatment(reference='ref'))"

# for some data, data: pd.DataFrame(), with survival data cols ['OS.time','OS']
design = pd.concat([
   patsy.dmatrix(formula_like=form, data=data, return_type="dataframe"),
   data[['OS.time','OS']]
], axis=1)

form = None
cph = CoxPHFitter()

cph.fit(
   design,
   duration_col='OS.time',
   event_col='OS',
   formula=form
)

t-silvers avatar Dec 04 '21 00:12 t-silvers

Hi @t-silvers , as of April 2022 Formulaic does support this.

The syntax is nearly identical but follows R conventions:

from formulaic import Formula
f = Formula("y ~ C(x, contr.treatment('ref'))")

Documentation is still lacking, but will be tidied up before v1.0.0 later this year.

matthewwardrop avatar Aug 18 '22 02:08 matthewwardrop