lifelines icon indicating copy to clipboard operation
lifelines copied to clipboard

Discrepancy in Survival Analysis Results Between Python and SAS

Open dominicyu04 opened this issue 9 months ago • 6 comments

I am currently learning survival analysis using Python, following a textbook example on the leukemia remission dataset. However, my results differ from those presented in the textbook. When I repeated the analysis using SAS, the results matched the textbook exactly. Could anyone help me understand the possible reasons for the discrepancy between the Python and SAS results?

I ran two models. The event of interest, or censorship, is named "Event," and the time to event variable is named "Survival_time." In the first model, I only assessed the impact of an explanatory variable, which is named "Group." In the second model, I further adjusted for "Log_WBC."

Remission.xlsx- This is the data

Image Image My python code of two models and result Image Image Image My sas code and results.

Thank you in advance for clarifying this discrepancy for me.

dominicyu04 avatar Mar 25 '25 15:03 dominicyu04

In your data, there are tied event and censoring times. The Cox Proportional Hazards model doesn't naturally deal with ties. lifelines uses Efron's method to handle ties, while SAS uses Breslow's method by default. You can modify your SAS code to use Efron's method via

model survival_time*event(0) = group /ties=efron;

then results should match

pzivich avatar Mar 25 '25 15:03 pzivich

In your data, there are tied event and censoring times. The Cox Proportional Hazards model doesn't naturally deal with ties. lifelines uses Efron's method to handle ties, while SAS uses Breslow's method by default. You can modify your SAS code to use Efron's method via

model survival_time*event(0) = group /ties=efron;

then results should match

Thank you so much for the clear explanation, Dr. Zivich — I now understand why the discrepancy occurred. I was wondering if lifelines supports the "exact" method for handling ties (similar to what's available in SAS or R), or other options like "breslow"? I’m hoping to use the exact method for now so I can closely follow the results presented in the textbook and better compare my Python output with the R and SAS examples included there.

dominicyu04 avatar Mar 25 '25 16:03 dominicyu04

To my knowledge, lifelines only supports Efron's method. The exact method is pretty computationally intense to do, and usually not feasible to run unless there are only a few ties. That computational difficulty is why the different approximations were originally developed.

For working through a textbook, my 2 cents is to run in SAS (or R) using whatever method the book uses to handle ties, then switch to Efron to compare.

pzivich avatar Mar 25 '25 16:03 pzivich

šŸ‘ @pzivich is exactly right. There are no plans to support other tie methods at this time.

CamDavidsonPilon avatar Mar 26 '25 13:03 CamDavidsonPilon

šŸ‘ @pzivich is exactly right. There are no plans to support other tie methods at this time.

Thank you.

dominicyu04 avatar Apr 06 '25 00:04 dominicyu04

To my knowledge, lifelines only supports Efron's method. The exact method is pretty computationally intense to do, and usually not feasible to run unless there are only a few ties. That computational difficulty is why the different approximations were originally developed.

For working through a textbook, my 2 cents is to run in SAS (or R) using whatever method the book uses to handle ties, then switch to Efron to compare.

Thank you

dominicyu04 avatar Apr 06 '25 00:04 dominicyu04