Kaggle-EEG icon indicating copy to clipboard operation
Kaggle-EEG copied to clipboard

Private AUC is different

Open YasminAMassoud opened this issue 4 years ago • 1 comments

_I produced different results running the algorithm

For SVM 0.72396 For RBT 0.61181 For Both 0.6943 I have only your result for both from the brain journal and it is 0.7952 , SVM is creating better results in my case so i am wondering you can share your input on why this happens i only changed input methods but the same algorithm is carried out , training all data and using cv on training set. Then testing is run using the new test set. so I have a few questions to help i identify the reason for this

1_ Results for training is Grown weak learners: 100 SVM: general model AUC: 0.82756 RBT: general model AUC: 0.75574 I didn't find your training results so if you can share them with me it will me identify if we are doing something different.

2-SetSafeIdx, is the method you use to only get safe files and use them only for training right? I wonder if you edited the train_and_test_data_labels_safe.csv to make this work ? as training files are in format of Pat1Train_1_0.mat so the method wont work. I ran it and now only 39953 files are chosen out of the 5047 total training files, with calculation they should be 3829 files safe ,however the method marks safe a file that is in datalist but not in .csv files. Is This the way you meant for the method to work . However when i edited the AUC went to down to SVM: general model AUC: 0.79039 RBT: general model AUC: 0.73239 for the training. Is there a way to know the files the training files you first entered to the algorithm and their number ? Are they the 5047 files in contest_train_data_labels.csv ?

3_CV is done on both RUS and SVM right?

4-If i want to regenerate results of single models, i edit copytestleaktotrain, featuresobject methods to run on 1 patient only so only data of patient 1 is trained and tested individually ? this means i will run the algorithm 3 times for training and testing each right ?

YasminAMassoud avatar Feb 17 '21 16:02 YasminAMassoud

Hi @YasminAMassoud

The train_and_test_data_labels_safe.csv was provided during the competition after a data leak was discovered - some of the test dataset incorrectly contained data where potions overlapped with the training dataset. If I remember correctly, that file contains the affect files, marked as 0 ("unsafe"). These were removed from evaluation in the competition, but were usable as training data as we already had the labels for them. The copyTestLeakToTrain.m script should copy the unsafe files into the training set, but I don't think the code will modify train_and_test_data_labels_safe.csv anywhere. I believe SetSafeIdx dealt with these new training files, which were individual files, rather than members of a 6 piece sequential segments like the original training data.

It's hard to say why the model performance differs, is 0.6943 the mean AUC? If so, it sounds about correct (see tables below). I don't remember exactly how the "overall AUC" in the competition aggregated the patients' individual scores. Possible a weighted mean? If this doesn't account for the difference my suspicion is that the training data set up may differ somehow. These models are very sensitive to the data they're trained on, partly due to the low number of positive examples, and partly due to the complexity of handling the segments and the leak.

Yes CV was done for both models, and training would require running 3 times to do the patients individually.

Overall AUC:

team overall AUC
0001 Notsorandomanymore 0.80701
0002 Oroto 0.79898
0003 GarethJones 0.79652
0004 QingnanTang 0.79458
0005 nullset 0.79363
0006 tralala boum boum pout pout 0.79197
0007 Medrr 0.80329
0008 michaln 0.79074
0009 DataSpring 0.79053
0010 fugusuki 0.78773
0011 tmunemot 0.78478
0012 Joseph Chui 0.78468
0013 cvanghel 0.78127
0014 krischen 0.7787
0015 QMRSD 0.7781
0016 deepfit 0.77638
0017 Claudia 0.77279
0018 bestfitting 0.77112
0019 Golovanov 0.77043
0020 ZeroDivisionError 0.76713

Mean AUC:

team average AUC patient 1 AUC patient 2 AUC patient 3 AUC
0022 Kyle 0.7673 0.69159 0.77341 0.8369
0031 Mickey 0.76254 0.68831 0.72723 0.8721
0009 DataSpring 0.7528 0.67467 0.73591 0.84783
0010 fugusuki 0.75203 0.70422 0.7743 0.77756
0017 Claudia 0.74334 0.66999 0.74813 0.8119
0001 Notsorandomanymore 0.74043 0.63324 0.72601 0.86203
0019 Golovanov 0.7398 0.6686 0.70549 0.84532
0006 tralala boum boum pout pout 0.73699 0.5663 0.84849 0.79619
0002 Oroto 0.7339 0.63476 0.70494 0.862
0021 BRA 0.73162 0.6979 0.83686 0.66011
0026 fergusoci 0.73097 0.71407 0.7647 0.71412
0049 Ben Ogorek 0.72912 0.81738 0.73177 0.63821
0027 Feagen 0.72422 0.65496 0.76485 0.75283
0047 ChipicitoSolverWorld 0.72109 0.57312 0.70082 0.88932
0008 michaln 0.7198 0.59535 0.72981 0.83425
0003 GarethJones 0.71355 0.58348 0.76205 0.79511
0037 Mike 0.71334 0.68224 0.74925 0.70851
0007 Medrr 0.71328 0.5365 0.77755 0.8258
0011 tmunemot 0.71228 0.61682 0.73926 0.78075
0004 QingnanTang 0.71125 0.56504 0.75173 0.81697

garethjns avatar Mar 01 '21 14:03 garethjns