Reproducing Jarfo results on Kaggle challenge data
Hi, I've been trying to reproduce Jarfo results from the Kaggle challenge (2013) with its final dataset from http://www.causality.inf.ethz.ch/CEdata/, i.e., with CEfinal_train_text.zip and CEfinal_test_text.zip.
It seems to output very different results compared to the original code, while the learning parameters and the used features seems to be the same.
import numpy as np
import pandas as pd
from cdt.causality.pairwise.Jarfo import Jarfo
from cdt.utils.io import read_causal_pairs
from cdt import SETTINGS
SETTINGS.GPU = False
SETTINGS.NJOBS = 1
train_data = read_causal_pairs(".../CEfinal_train_pairs.csv")
train_target = pd.read_csv(".../CEfinal_train_target.csv").iloc[:,:2].set_index("SampleID")
test_data = read_causal_pairs(".../CEfinal_test_pairs.csv")
test_target = pd.read_csv(".../CEfinal_test_target.csv").iloc[:,:2].set_index("SampleID")
j = Jarfo()
j.fit(train_data, train_target)
jp = j.predict(test_data)
acc = np.mean(jp * test_target.values > 0)
print(acc)
0.25491827465325406
I've tested it with python 3.7.3 cdt 0.5.14
Thank you in advance for any hint. Best Tom
Thanks for the feedback; this is concerning. Maybe switching the code to python3 broke something. I should look into it
Sorry for the delay, I've started investigating. There do not seem to be any issues with the features output by the model.
I will continue to investigate further.
Edit: It comes from the fitness function. The predictions are random compared to the original code.