lifelines icon indicating copy to clipboard operation
lifelines copied to clipboard

Cox Regression with a categorical variable

Open anyang-kevin opened this issue 2 years ago • 11 comments

Hi, I have similar problems with #1203 I want to choose a specific category in Cox Regression for categorical variable, whether in a univariate or multivariate analysis. 1.how to choose category in multivariate Cox? the formula might be formula='C(a,Treatment=(x))+C(b,Treatment=(y))'? 2.I tried to choose a specific category in univariate, but the error message says that Treatment is not defined. This is my code: cph.fit(for_df,duration_col='OS.time',event_col='OS',formula="C(gender,Treatment('female'))") and this is the error message: formulaic.errors.FactorEvaluationError: Unable to evaluate factor C(gender,Treatment('female')). [NameError: name 'Treatment' is not defined]

anyang-kevin avatar Feb 28 '23 09:02 anyang-kevin

Hi @anyang-kevin, try "C(gender, contr.treatment(base='female'))"

CamDavidsonPilon avatar Feb 28 '23 14:02 CamDavidsonPilon

C(gender, contr.treatment(base='female'))

Thanks for your reply,but it dosen't work. code is : cph.fit(for_df,duration_col='OS.time',event_col='OS',formula="C('gender', contr.treatment(base='female'))") and error: formulaic.errors.FactorEvaluationError: Unable to evaluate factor C('gender', contr.treatment(base='female')). [NameError: name 'contr' is not defined] My lifelines version is 0.27.1

And I hope you can answer my first question. I'm sorry I don't know enough about formulaic package. Thank you!

anyang-kevin avatar Feb 28 '23 15:02 anyang-kevin

What version of formulaic do you have? You can you use import formulaic; print(formulaic.__version__) to see

CamDavidsonPilon avatar Mar 01 '23 14:03 CamDavidsonPilon

Also, don't put quotes around gender, it should be:

formula="C(gender, contr.treatment(base='female'))")

CamDavidsonPilon avatar Mar 01 '23 14:03 CamDavidsonPilon

  1. My formulaic version is 0.2.4
  2. gender is dataframe's colname, not a variable.So if I use C(gender,contr.treatment(...)), I will receive an error message: formulaic.errors.FactorEvaluationError: Unable to evaluate factor C(gender, contr.treatment(base='female')). [NameError: name 'gender' is not defined]. I guess those aren't the core issues.

If you need to check my package file, you can tell me the package you need and your email address, maybe it will help you find the reason directly?

anyang-kevin avatar Mar 02 '23 07:03 anyang-kevin

Try upgrading formualic to 0.5.2, pip install formulaic==0.5.2

CamDavidsonPilon avatar Mar 02 '23 12:03 CamDavidsonPilon

Try upgrading formualic to 0.5.2, pip install formulaic==0.5.2

sorry, my python version is 3.7.0, cant update to 0.5.2, it need python version >= 3.7.2, but i have to keep my python version before my project finish.Although I know how to fix it, I think it's a high risk things to change version or keep two version python in Windows system.

anyang-kevin avatar Mar 02 '23 13:03 anyang-kevin

You'll have to use pandas to manipulate the dataframe prior to providing it to .fit then. Ex: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

CamDavidsonPilon avatar Mar 02 '23 14:03 CamDavidsonPilon

You'll have to use pandas to manipulate the dataframe prior to providing it to .fit then. Ex: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Actually, That's what I want to know after I solve this problem. If use one hot encode or other method to process categorical variable, the cox result will loss control group? Like in gender, male is 0 and female is 1, and cox model get HR for both male and female. But in categorical variable, male is control and female have HR. Some literature uses the one-hot method, others the control-treat method. Which is the best way to process categorical variable? Or they will get same result? Another problem with the one-hot code is that 0,1,2...... in cox is different. If categorical more than 2 factor, factor 5 might means 5 times effect than factor 1? But they may have same weight in model. Is such an influence acceptable? Or it doesn't create the problem?

anyang-kevin avatar Mar 02 '23 15:03 anyang-kevin

You'll have to use pandas to manipulate the dataframe prior to providing it to .fit then. Ex: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Could you tell me if I am using a formula, do I no longer need to use pandas for dummy variable conversion, and just need to convert the pandas column type to 'category'? Will lifelines automatically handle the categorical variables?

Cryptojoyz avatar Jun 22 '23 18:06 Cryptojoyz

@Cryptojoyz It should work. I am using the pandas method .astype() to specify some columns as categories. Currently doing a CoxPH regression with a mix of continuous and categorical variables, which works nicely.

dcstang avatar May 02 '24 10:05 dcstang