pycox icon indicating copy to clipboard operation
pycox copied to clipboard

PyCOX data leaking

Open hellorp1990 opened this issue 2 years ago • 1 comments

Hi,

I was working on cross validation/ splitting data using different seed points and then train a PyCOX model before averaging the result. I tried to fix the data leakage y removing the model/log/ and other variables where there can be some data leakage. Also initializing model every time in the loop and emptying the cache also. But still I have some data leakage.

Can anyone help me how to fix/reduce this?

Please check the code below:

for seed in SEEDS: data, target = X1,Z1 X_train1,X_test,Z_train1,Z_test = train_test_split(data, target, test_size=0.25, random_state=seed) X_train,X_val,Z_train,Z_val = train_test_split(X_train1,Z_train1, test_size=0.25, random_state=seed) val = X_val, Z_val get_target = lambda df: (df['AVAL_PFS'].to_numpy(dtype ='float32'), df['EVENT_PFS'].to_numpy(dtype ='int32')) durations_test, events_test = get_target(Z_test)

model2 = CoxPH(net2,  device=torch.device('cuda:0'))
model2.optimizer.set_lr(0.002)

log = model2.fit(X_train, Z_train, batch_size, epochs, callbacks,verbose, val_data=val, val_batch_size=batch_size)

_ = model2.compute_baseline_hazards()

surv = model2.predict_surv_df(X_test)
ev = EvalSurv(surv, durations_test, events_test, censor_surv='km')
score=ev.concordance_td()
cv_score.append(score)
del log,model2,optimizer,X_train,X_test,Z_train, val,surv, durations_test, events_test,score
torch.cuda.empty_cache()'

hellorp1990 avatar Nov 30 '22 14:11 hellorp1990