Positive Unlabeled Learning for stepstoreproduce model
Regarding #705
@marco-c With BaggingClassifier, Undersampler turned off Cross Validation scores: Accuracy: f0.9862444146820298 (+/- 0.0018937457011950216) Precision: f0.7003269062965275 (+/- 0.06540289338010274) Recall: f0.6149242215486359 (+/- 0.04031561239639824) X_train: (45000, 570310), y_train: (45000,) X_test: (5000, 570310), y_test: (5000,) /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:611: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:616: RuntimeWarning: invalid value encountered in true_divide predictions.sum(axis=1)[:, np.newaxis]) Test Set scores: No confidence threshold - 5000 classified pre rec spe f1 geo iba sup
1 0.74 0.54 1.00 0.62 0.73 0.51 98
0 0.99 1.00 0.54 0.99 0.73 0.56 4902
avg / total 0.99 0.99 0.55 0.99 0.73 0.56 5000
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 53 │ 45 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 19 │ 4883 │ ╘════════════╧═════════════════╧═════════════════╛
With xgboost for comparison, Undersampler turned on 73 bugs have no steps to reproduce 2642 bugs have steps to reproduce X: (2715, 68672), y: (2715,) Cross Validation scores: Accuracy: f0.6930046232725715 (+/- 0.08506114027939689) Precision: f0.9889862871434468 (+/- 0.011123332687863755) Recall: f0.6927315347191507 (+/- 0.09585654167878219) X_train: (128, 68672), y_train: (128,) X_test: (272, 68672), y_test: (272,)
Codecov Report
Merging #784 into master will increase coverage by
46.96%. The diff coverage isn/a.
@@ Coverage Diff @@
## master #784 +/- ##
=========================================
+ Coverage 53.03% 100% +46.96%
=========================================
Files 81 1 -80
Lines 5461 14 -5447
=========================================
- Hits 2896 14 -2882
+ Misses 2565 0 -2565
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update e2810c2...abdd3e1. Read the comment docs.
We need to calculate the metrics we talked about on IRC (that is, if you hide some real positives, is the model able to find them? if you hide some real negatives, is the model able to tell that they are negatives?)
One possible option (maybe easier? Not sure) is to make the test set only contain true positives and negatives.
@marco-c After the latest changes, it performs a lot better now 48965 bugs have no labels 27 bugs have no steps to reproduce 1008 bugs have steps to reproduce X: (50000, 575524), y: (50000,) Cross Validation scores: Accuracy: f0.8663989202635347 (+/- 0.020209775582136394) Precision: f0.11253227786380618 (+/- 0.014047676653706294) Recall: f0.825077081192189 (+/- 0.07402488946446677) X_train: (35000, 575524), y_train: (35000,) X_test: (321, 575524), y_test: (321,) Test Set scores: No confidence threshold - 321 classified pre rec spe f1 geo iba sup
1 0.99 0.81 0.70 0.89 0.75 0.57 311
0 0.11 0.70 0.81 0.18 0.75 0.56 10
avg / total 0.96 0.81 0.70 0.87 0.75 0.57 321
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 59 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.6 - 295 classified pre rec spe f1 geo iba sup
1 0.99 0.88 0.57 0.93 0.71 0.52 288
0 0.10 0.57 0.88 0.17 0.71 0.48 7
avg / total 0.97 0.87 0.58 0.91 0.71 0.51 295
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 36 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.7 - 262 classified pre rec spe f1 geo iba sup
1 0.99 0.92 0.67 0.96 0.78 0.63 256
0 0.17 0.67 0.92 0.27 0.78 0.60 6
avg / total 0.97 0.92 0.67 0.94 0.78 0.63 262
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 236 │ 20 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.8 - 231 classified pre rec spe f1 geo iba sup
1 0.99 0.96 0.60 0.98 0.76 0.60 226
0 0.25 0.60 0.96 0.35 0.76 0.56 5
avg / total 0.97 0.95 0.61 0.96 0.76 0.60 231
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 217 │ 9 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 3 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.9 - 204 classified pre rec spe f1 geo iba sup
1 0.99 0.98 0.50 0.98 0.70 0.51 202
0 0.17 0.50 0.98 0.25 0.70 0.46 2
avg / total 0.99 0.97 0.50 0.98 0.70 0.51 204
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 197 │ 5 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 1 │ ╘════════════╧═════════════════╧═════════════════╛