bugbug icon indicating copy to clipboard operation
bugbug copied to clipboard

Restrict training set of stepstoreproduce model only to defects

Open chidauri opened this issue 6 years ago • 6 comments

Regarding #792

chidauri avatar Aug 01 '19 06:08 chidauri

After this change 69 bugs have no steps to reproduce 2563 bugs have steps to reproduce X: (2632, 67831), y: (2632,) Cross Validation scores: Accuracy: f0.744493313309018 (+/- 0.02761428505128297) Precision: f0.9914028351617405 (+/- 0.007963866807440885) Recall: f0.7444676075912519 (+/- 0.03025468155899755) X_train: (118, 67831), y_train: (118,) X_test: (264, 67831), y_test: (264,) Test Set scores: No confidence threshold - 264 classified pre rec spe f1 geo iba sup

      1       0.99      0.77      0.80      0.86      0.78      0.61       254
      0       0.12      0.80      0.77      0.21      0.78      0.62        10

avg / total 0.96 0.77 0.80 0.84 0.78 0.61 264

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 195 │ 59 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 8 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.6 - 223 classified pre rec spe f1 geo iba sup

      1       0.99      0.81      0.78      0.89      0.80      0.63       214
      0       0.15      0.78      0.81      0.25      0.80      0.63         9

avg / total 0.95 0.81 0.78 0.87 0.80 0.63 223

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 174 │ 40 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.7 - 182 classified pre rec spe f1 geo iba sup

      1       0.99      0.84      0.78      0.91      0.81      0.66       173
      0       0.21      0.78      0.84      0.33      0.81      0.65         9

avg / total 0.95 0.84 0.78 0.88 0.81 0.66 182

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 146 │ 27 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.8 - 147 classified pre rec spe f1 geo iba sup

      1       0.98      0.89      0.78      0.94      0.83      0.70       138
      0       0.32      0.78      0.89      0.45      0.83      0.69         9

avg / total 0.94 0.88 0.78 0.91 0.83 0.70 147

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 123 │ 15 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.9 - 81 classified pre rec spe f1 geo iba sup

      1       0.99      0.91      0.83      0.94      0.87      0.76        75
      0       0.42      0.83      0.91      0.56      0.87      0.75         6

avg / total 0.94 0.90 0.84 0.92 0.87 0.76 81

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 68 │ 7 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 5 │ ╘════════════╧═════════════════╧═════════════════╛

Before this change

73 bugs have no steps to reproduce 2642 bugs have steps to reproduce X: (2715, 68671), y: (2715,) Cross Validation scores: Accuracy: f0.6966856048676635 (+/- 0.09750364886404748) Precision: f0.9890742501710503 (+/- 0.01088263716211331) Recall: f0.6965130473241927 (+/- 0.10856018472515121) X_train: (128, 68671), y_train: (128,) X_test: (272, 68671), y_test: (272,) Test Set scores: No confidence threshold - 272 classified pre rec spe f1 geo iba sup

      1       0.97      0.70      0.44      0.82      0.56      0.32       263
      0       0.05      0.44      0.70      0.09      0.56      0.30         9

avg / total 0.94 0.69 0.45 0.79 0.56 0.32 272

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 185 │ 78 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 5 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.6 - 228 classified pre rec spe f1 geo iba sup

      1       0.98      0.72      0.43      0.83      0.56      0.32       221
      0       0.05      0.43      0.72      0.08      0.56      0.30         7

avg / total 0.95 0.71 0.44 0.81 0.56 0.32 228

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 159 │ 62 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 4 │ 3 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.7 - 178 classified pre rec spe f1 geo iba sup

      1       0.97      0.78      0.33      0.87      0.51      0.27       172
      0       0.05      0.33      0.78      0.09      0.51      0.25         6

avg / total 0.94 0.77 0.35 0.84 0.51 0.27 178

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 135 │ 37 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 4 │ 2 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.8 - 119 classified pre rec spe f1 geo iba sup

      1       0.98      0.83      0.50      0.90      0.64      0.43       115
      0       0.09      0.50      0.83      0.15      0.64      0.40         4

avg / total 0.95 0.82 0.51 0.87 0.64 0.43 119

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 95 │ 20 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 2 │ ╘════════════╧═════════════════╧═════════════════╛

Confidence threshold > 0.9 - 68 classified pre rec spe f1 geo iba sup

      1       0.98      0.88      0.50      0.93      0.66      0.46        66
      0       0.11      0.50      0.88      0.18      0.66      0.42         2

avg / total 0.96 0.87 0.51 0.91 0.66 0.46 68

╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 58 │ 8 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 1 │ ╘════════════╧═════════════════╧═════════════════╛

chidauri avatar Aug 01 '19 06:08 chidauri

Could you show here the confusion matrices too?

marco-c avatar Aug 01 '19 08:08 marco-c

Codecov Report

Merging #817 into master will increase coverage by 0.69%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #817      +/-   ##
==========================================
+ Coverage   52.83%   53.52%   +0.69%     
==========================================
  Files          75       75              
  Lines        5127     5085      -42     
==========================================
+ Hits         2709     2722      +13     
+ Misses       2418     2363      -55
Impacted Files Coverage Δ
bugbug/models/stepstoreproduce.py 67.39% <100%> (+1.48%) :arrow_up:
scripts/regressor_finder.py 0% <0%> (ø) :arrow_up:
bugbug/similarity.py 0% <0%> (ø) :arrow_up:
scripts/evaluate_similarity.py 0% <0%> (ø) :arrow_up:
scripts/commit_classifier.py 0% <0%> (ø) :arrow_up:
scripts/microannotate_generator.py 0% <0%> (ø) :arrow_up:
bugbug/repository.py 74.73% <0%> (+0.35%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 001fba1...f184858. Read the comment docs.

codecov-io avatar Aug 01 '19 09:08 codecov-io

Could you show here the confusion matrices too?

The before ones too.

marco-c avatar Aug 01 '19 16:08 marco-c

BTW, there isn't much change, but this could make a difference when we have PULearning, so I'll keep it open and we'll re-evaluate after that.

marco-c avatar Aug 01 '19 16:08 marco-c

Could you show here the confusion matrices too?

The before ones too.

Done

chidauri avatar Aug 01 '19 18:08 chidauri