bugbug icon indicating copy to clipboard operation
bugbug copied to clipboard

Positive Unlabeled Learning for stepstoreproduce model

Open chidauri opened this issue 6 years ago • 4 comments

Regarding #705

chidauri avatar Jul 25 '19 10:07 chidauri

@marco-c With BaggingClassifier, Undersampler turned off Cross Validation scores: Accuracy: f0.9862444146820298 (+/- 0.0018937457011950216) Precision: f0.7003269062965275 (+/- 0.06540289338010274) Recall: f0.6149242215486359 (+/- 0.04031561239639824) X_train: (45000, 570310), y_train: (45000,) X_test: (5000, 570310), y_test: (5000,) /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:611: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:616: RuntimeWarning: invalid value encountered in true_divide predictions.sum(axis=1)[:, np.newaxis]) Test Set scores: No confidence threshold - 5000 classified pre rec spe f1 geo iba sup

      1       0.74      0.54      1.00      0.62      0.73      0.51        98
      0       0.99      1.00      0.54      0.99      0.73      0.56      4902

avg / total 0.99 0.99 0.55 0.99 0.73 0.56 5000

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 53 │ 45 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 19 │ 4883 │ ╘════════════╧═════════════════╧═════════════════╛

With xgboost for comparison, Undersampler turned on 73 bugs have no steps to reproduce 2642 bugs have steps to reproduce X: (2715, 68672), y: (2715,) Cross Validation scores: Accuracy: f0.6930046232725715 (+/- 0.08506114027939689) Precision: f0.9889862871434468 (+/- 0.011123332687863755) Recall: f0.6927315347191507 (+/- 0.09585654167878219) X_train: (128, 68672), y_train: (128,) X_test: (272, 68672), y_test: (272,)

chidauri avatar Jul 25 '19 10:07 chidauri

Codecov Report

Merging #784 into master will increase coverage by 46.96%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #784       +/-   ##
=========================================
+ Coverage   53.03%   100%   +46.96%     
=========================================
  Files          81      1       -80     
  Lines        5461     14     -5447     
=========================================
- Hits         2896     14     -2882     
+ Misses       2565      0     -2565
Impacted Files Coverage Δ
tests/test_assignee.py
bugbug/models/component.py
bugbug/bug_features.py
scripts/commit_retriever.py
bugbug/bugzilla.py
bugbug/bug_snapshot.py
tests/test_devdocneeded.py
bugbug/__init__.py
bugbug/models/defect_enhancement_task.py
scripts/regressor_finder.py
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e2810c2...abdd3e1. Read the comment docs.

codecov-io avatar Jul 25 '19 13:07 codecov-io

We need to calculate the metrics we talked about on IRC (that is, if you hide some real positives, is the model able to find them? if you hide some real negatives, is the model able to tell that they are negatives?)

One possible option (maybe easier? Not sure) is to make the test set only contain true positives and negatives.

marco-c avatar Jul 29 '19 15:07 marco-c

@marco-c After the latest changes, it performs a lot better now 48965 bugs have no labels 27 bugs have no steps to reproduce 1008 bugs have steps to reproduce X: (50000, 575524), y: (50000,) Cross Validation scores: Accuracy: f0.8663989202635347 (+/- 0.020209775582136394) Precision: f0.11253227786380618 (+/- 0.014047676653706294) Recall: f0.825077081192189 (+/- 0.07402488946446677) X_train: (35000, 575524), y_train: (35000,) X_test: (321, 575524), y_test: (321,) Test Set scores: No confidence threshold - 321 classified pre rec spe f1 geo iba sup

      1       0.99      0.81      0.70      0.89      0.75      0.57       311
      0       0.11      0.70      0.81      0.18      0.75      0.56        10

avg / total 0.96 0.81 0.70 0.87 0.75 0.57 321

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 59 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.6 - 295 classified pre rec spe f1 geo iba sup

      1       0.99      0.88      0.57      0.93      0.71      0.52       288
      0       0.10      0.57      0.88      0.17      0.71      0.48         7

avg / total 0.97 0.87 0.58 0.91 0.71 0.51 295

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 36 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.7 - 262 classified pre rec spe f1 geo iba sup

      1       0.99      0.92      0.67      0.96      0.78      0.63       256
      0       0.17      0.67      0.92      0.27      0.78      0.60         6

avg / total 0.97 0.92 0.67 0.94 0.78 0.63 262

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 236 │ 20 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.8 - 231 classified pre rec spe f1 geo iba sup

      1       0.99      0.96      0.60      0.98      0.76      0.60       226
      0       0.25      0.60      0.96      0.35      0.76      0.56         5

avg / total 0.97 0.95 0.61 0.96 0.76 0.60 231

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 217 │ 9 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 3 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.9 - 204 classified pre rec spe f1 geo iba sup

      1       0.99      0.98      0.50      0.98      0.70      0.51       202
      0       0.17      0.50      0.98      0.25      0.70      0.46         2

avg / total 0.99 0.97 0.50 0.98 0.70 0.51 204

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 197 │ 5 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 1 │ ╘════════════╧═════════════════╧═════════════════╛

chidauri avatar Aug 11 '19 20:08 chidauri