Discrepancy in Survival Analysis Results Between Python and SAS
I am currently learning survival analysis using Python, following a textbook example on the leukemia remission dataset. However, my results differ from those presented in the textbook. When I repeated the analysis using SAS, the results matched the textbook exactly. Could anyone help me understand the possible reasons for the discrepancy between the Python and SAS results?
I ran two models. The event of interest, or censorship, is named "Event," and the time to event variable is named "Survival_time." In the first model, I only assessed the impact of an explanatory variable, which is named "Group." In the second model, I further adjusted for "Log_WBC."
Remission.xlsx- This is the data
Thank you in advance for clarifying this discrepancy for me.
In your data, there are tied event and censoring times. The Cox Proportional Hazards model doesn't naturally deal with ties. lifelines uses Efron's method to handle ties, while SAS uses Breslow's method by default. You can modify your SAS code to use Efron's method via
model survival_time*event(0) = group /ties=efron;
then results should match
In your data, there are tied event and censoring times. The Cox Proportional Hazards model doesn't naturally deal with ties.
lifelinesuses Efron's method to handle ties, while SAS uses Breslow's method by default. You can modify your SAS code to use Efron's method viamodel survival_time*event(0) = group /ties=efron;then results should match
Thank you so much for the clear explanation, Dr. Zivich ā I now understand why the discrepancy occurred. I was wondering if lifelines supports the "exact" method for handling ties (similar to what's available in SAS or R), or other options like "breslow"? Iām hoping to use the exact method for now so I can closely follow the results presented in the textbook and better compare my Python output with the R and SAS examples included there.
To my knowledge, lifelines only supports Efron's method. The exact method is pretty computationally intense to do, and usually not feasible to run unless there are only a few ties. That computational difficulty is why the different approximations were originally developed.
For working through a textbook, my 2 cents is to run in SAS (or R) using whatever method the book uses to handle ties, then switch to Efron to compare.
š @pzivich is exactly right. There are no plans to support other tie methods at this time.
š @pzivich is exactly right. There are no plans to support other tie methods at this time.
Thank you.
To my knowledge,
lifelinesonly supports Efron's method. The exact method is pretty computationally intense to do, and usually not feasible to run unless there are only a few ties. That computational difficulty is why the different approximations were originally developed.For working through a textbook, my 2 cents is to run in SAS (or R) using whatever method the book uses to handle ties, then switch to Efron to compare.
Thank you