Regarding #705

Jul 25 '19 10:07 chidauri

@marco-c With BaggingClassifier, Undersampler turned off Cross Validation scores: Accuracy: f0.9862444146820298 (+/- 0.0018937457011950216) Precision: f0.7003269062965275 (+/- 0.06540289338010274) Recall: f0.6149242215486359 (+/- 0.04031561239639824) X_train: (45000, 570310), y_train: (45000,) X_test: (5000, 570310), y_test: (5000,) /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:611: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " /home/harshit/.local/lib/python3.6/site-packages/sklearn/ensemble/bagging.py:616: RuntimeWarning: invalid value encountered in true_divide predictions.sum(axis=1)[:, np.newaxis]) Test Set scores: No confidence threshold - 5000 classified pre rec spe f1 geo iba sup

      1       0.74      0.54      1.00      0.62      0.73      0.51        98
      0       0.99      1.00      0.54      0.99      0.73      0.56      4902

avg / total 0.99 0.99 0.55 0.99 0.73 0.56 5000

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 53 │ 45 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 19 │ 4883 │ ╘════════════╧═════════════════╧═════════════════╛

With xgboost for comparison, Undersampler turned on 73 bugs have no steps to reproduce 2642 bugs have steps to reproduce X: (2715, 68672), y: (2715,) Cross Validation scores: Accuracy: f0.6930046232725715 (+/- 0.08506114027939689) Precision: f0.9889862871434468 (+/- 0.011123332687863755) Recall: f0.6927315347191507 (+/- 0.09585654167878219) X_train: (128, 68672), y_train: (128,) X_test: (272, 68672), y_test: (272,)

Jul 25 '19 10:07 chidauri

Codecov Report

Merging #784 into master will increase coverage by 46.96%. The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #784       +/-   ##
=========================================
+ Coverage   53.03%   100%   +46.96%     
=========================================
  Files          81      1       -80     
  Lines        5461     14     -5447     
=========================================
- Hits         2896     14     -2882     
+ Misses       2565      0     -2565

Impacted Files	Coverage Δ
tests/test_assignee.py
bugbug/models/component.py
bugbug/bug_features.py
scripts/commit_retriever.py
bugbug/bugzilla.py
bugbug/bug_snapshot.py
tests/test_devdocneeded.py
bugbug/__init__.py
bugbug/models/defect_enhancement_task.py
scripts/regressor_finder.py
... and 68 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e2810c2...abdd3e1. Read the comment docs.

Jul 25 '19 13:07 codecov-io

We need to calculate the metrics we talked about on IRC (that is, if you hide some real positives, is the model able to find them? if you hide some real negatives, is the model able to tell that they are negatives?)

One possible option (maybe easier? Not sure) is to make the test set only contain true positives and negatives.

Jul 29 '19 15:07 marco-c

@marco-c After the latest changes, it performs a lot better now 48965 bugs have no labels 27 bugs have no steps to reproduce 1008 bugs have steps to reproduce X: (50000, 575524), y: (50000,) Cross Validation scores: Accuracy: f0.8663989202635347 (+/- 0.020209775582136394) Precision: f0.11253227786380618 (+/- 0.014047676653706294) Recall: f0.825077081192189 (+/- 0.07402488946446677) X_train: (35000, 575524), y_train: (35000,) X_test: (321, 575524), y_test: (321,) Test Set scores: No confidence threshold - 321 classified pre rec spe f1 geo iba sup

      1       0.99      0.81      0.70      0.89      0.75      0.57       311
      0       0.11      0.70      0.81      0.18      0.75      0.56        10

avg / total 0.96 0.81 0.70 0.87 0.75 0.57 321

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 59 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.6 - 295 classified pre rec spe f1 geo iba sup

      1       0.99      0.88      0.57      0.93      0.71      0.52       288
      0       0.10      0.57      0.88      0.17      0.71      0.48         7

avg / total 0.97 0.87 0.58 0.91 0.71 0.51 295

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 252 │ 36 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 3 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.7 - 262 classified pre rec spe f1 geo iba sup

      1       0.99      0.92      0.67      0.96      0.78      0.63       256
      0       0.17      0.67      0.92      0.27      0.78      0.60         6

avg / total 0.97 0.92 0.67 0.94 0.78 0.63 262

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 236 │ 20 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.8 - 231 classified pre rec spe f1 geo iba sup

      1       0.99      0.96      0.60      0.98      0.76      0.60       226
      0       0.25      0.60      0.96      0.35      0.76      0.56         5

avg / total 0.97 0.95 0.61 0.96 0.76 0.60 231

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 217 │ 9 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 3 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.9 - 204 classified pre rec spe f1 geo iba sup

      1       0.99      0.98      0.50      0.98      0.70      0.51       202
      0       0.17      0.50      0.98      0.25      0.70      0.46         2

avg / total 0.99 0.97 0.50 0.98 0.70 0.51 204

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 197 │ 5 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 1 │ ╘════════════╧═════════════════╧═════════════════╛

Aug 11 '19 20:08 chidauri

Positive Unlabeled Learning for stepstoreproduce model

Codecov Report