Kaggle-EEG
                                
                                 Kaggle-EEG copied to clipboard
                                
                                    Kaggle-EEG copied to clipboard
                            
                            
                            
                        Private AUC is different
_I produced different results running the algorithm
For SVM 0.72396 For RBT 0.61181 For Both 0.6943 I have only your result for both from the brain journal and it is 0.7952 , SVM is creating better results in my case so i am wondering you can share your input on why this happens i only changed input methods but the same algorithm is carried out , training all data and using cv on training set. Then testing is run using the new test set. so I have a few questions to help i identify the reason for this
1_ Results for training is Grown weak learners: 100 SVM: general model AUC: 0.82756 RBT: general model AUC: 0.75574 I didn't find your training results so if you can share them with me it will me identify if we are doing something different.
2-SetSafeIdx, is the method you use to only get safe files and use them only for training right? I wonder if you edited the train_and_test_data_labels_safe.csv to make this work ? as training files are in format of Pat1Train_1_0.mat so the method wont work. I ran it and now only 39953 files are chosen out of the 5047 total training files, with calculation they should be 3829 files safe ,however the method marks safe a file that is in datalist but not in .csv files. Is This the way you meant for the method to work . However when i edited the AUC went to down to SVM: general model AUC: 0.79039 RBT: general model AUC: 0.73239 for the training. Is there a way to know the files the training files you first entered to the algorithm and their number ? Are they the 5047 files in contest_train_data_labels.csv ?
3_CV is done on both RUS and SVM right?
4-If i want to regenerate results of single models, i edit copytestleaktotrain, featuresobject methods to run on 1 patient only so only data of patient 1 is trained and tested individually ? this means i will run the algorithm 3 times for training and testing each right ?
Hi @YasminAMassoud
The train_and_test_data_labels_safe.csv was provided during the competition after a data leak was discovered - some of the test dataset incorrectly contained data where potions overlapped with the training dataset. If I remember correctly, that file contains the affect files, marked as 0 ("unsafe"). These were removed from evaluation in the competition, but were usable as training data as we already had the labels for them. The copyTestLeakToTrain.m script should copy the unsafe files into the training set, but I don't think the code will modify train_and_test_data_labels_safe.csv anywhere. I believe SetSafeIdx dealt with these new training files, which were individual files, rather than members of a 6 piece sequential segments like the original training data.
It's hard to say why the model performance differs, is 0.6943 the mean AUC? If so, it sounds about correct (see tables below). I don't remember exactly how the "overall AUC" in the competition aggregated the patients' individual scores. Possible a weighted mean? If this doesn't account for the difference my suspicion is that the training data set up may differ somehow. These models are very sensitive to the data they're trained on, partly due to the low number of positive examples, and partly due to the complexity of handling the segments and the leak.
Yes CV was done for both models, and training would require running 3 times to do the patients individually.
Overall AUC:
| team | overall AUC | 
|---|---|
| 0001 Notsorandomanymore | 0.80701 | 
| 0002 Oroto | 0.79898 | 
| 0003 GarethJones | 0.79652 | 
| 0004 QingnanTang | 0.79458 | 
| 0005 nullset | 0.79363 | 
| 0006 tralala boum boum pout pout | 0.79197 | 
| 0007 Medrr | 0.80329 | 
| 0008 michaln | 0.79074 | 
| 0009 DataSpring | 0.79053 | 
| 0010 fugusuki | 0.78773 | 
| 0011 tmunemot | 0.78478 | 
| 0012 Joseph Chui | 0.78468 | 
| 0013 cvanghel | 0.78127 | 
| 0014 krischen | 0.7787 | 
| 0015 QMRSD | 0.7781 | 
| 0016 deepfit | 0.77638 | 
| 0017 Claudia | 0.77279 | 
| 0018 bestfitting | 0.77112 | 
| 0019 Golovanov | 0.77043 | 
| 0020 ZeroDivisionError | 0.76713 | 
Mean AUC:
| team | average AUC | patient 1 AUC | patient 2 AUC | patient 3 AUC | 
|---|---|---|---|---|
| 0022 Kyle | 0.7673 | 0.69159 | 0.77341 | 0.8369 | 
| 0031 Mickey | 0.76254 | 0.68831 | 0.72723 | 0.8721 | 
| 0009 DataSpring | 0.7528 | 0.67467 | 0.73591 | 0.84783 | 
| 0010 fugusuki | 0.75203 | 0.70422 | 0.7743 | 0.77756 | 
| 0017 Claudia | 0.74334 | 0.66999 | 0.74813 | 0.8119 | 
| 0001 Notsorandomanymore | 0.74043 | 0.63324 | 0.72601 | 0.86203 | 
| 0019 Golovanov | 0.7398 | 0.6686 | 0.70549 | 0.84532 | 
| 0006 tralala boum boum pout pout | 0.73699 | 0.5663 | 0.84849 | 0.79619 | 
| 0002 Oroto | 0.7339 | 0.63476 | 0.70494 | 0.862 | 
| 0021 BRA | 0.73162 | 0.6979 | 0.83686 | 0.66011 | 
| 0026 fergusoci | 0.73097 | 0.71407 | 0.7647 | 0.71412 | 
| 0049 Ben Ogorek | 0.72912 | 0.81738 | 0.73177 | 0.63821 | 
| 0027 Feagen | 0.72422 | 0.65496 | 0.76485 | 0.75283 | 
| 0047 ChipicitoSolverWorld | 0.72109 | 0.57312 | 0.70082 | 0.88932 | 
| 0008 michaln | 0.7198 | 0.59535 | 0.72981 | 0.83425 | 
| 0003 GarethJones | 0.71355 | 0.58348 | 0.76205 | 0.79511 | 
| 0037 Mike | 0.71334 | 0.68224 | 0.74925 | 0.70851 | 
| 0007 Medrr | 0.71328 | 0.5365 | 0.77755 | 0.8258 | 
| 0011 tmunemot | 0.71228 | 0.61682 | 0.73926 | 0.78075 | 
| 0004 QingnanTang | 0.71125 | 0.56504 | 0.75173 | 0.81697 |