bugbug
bugbug copied to clipboard
Restrict training set of stepstoreproduce model only to defects
Regarding #792
After this change 69 bugs have no steps to reproduce 2563 bugs have steps to reproduce X: (2632, 67831), y: (2632,) Cross Validation scores: Accuracy: f0.744493313309018 (+/- 0.02761428505128297) Precision: f0.9914028351617405 (+/- 0.007963866807440885) Recall: f0.7444676075912519 (+/- 0.03025468155899755) X_train: (118, 67831), y_train: (118,) X_test: (264, 67831), y_test: (264,) Test Set scores: No confidence threshold - 264 classified pre rec spe f1 geo iba sup
1 0.99 0.77 0.80 0.86 0.78 0.61 254
0 0.12 0.80 0.77 0.21 0.78 0.62 10
avg / total 0.96 0.77 0.80 0.84 0.78 0.61 264
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 195 │ 59 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 8 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.6 - 223 classified pre rec spe f1 geo iba sup
1 0.99 0.81 0.78 0.89 0.80 0.63 214
0 0.15 0.78 0.81 0.25 0.80 0.63 9
avg / total 0.95 0.81 0.78 0.87 0.80 0.63 223
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 174 │ 40 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.7 - 182 classified pre rec spe f1 geo iba sup
1 0.99 0.84 0.78 0.91 0.81 0.66 173
0 0.21 0.78 0.84 0.33 0.81 0.65 9
avg / total 0.95 0.84 0.78 0.88 0.81 0.66 182
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 146 │ 27 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.8 - 147 classified pre rec spe f1 geo iba sup
1 0.98 0.89 0.78 0.94 0.83 0.70 138
0 0.32 0.78 0.89 0.45 0.83 0.69 9
avg / total 0.94 0.88 0.78 0.91 0.83 0.70 147
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 123 │ 15 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 7 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.9 - 81 classified pre rec spe f1 geo iba sup
1 0.99 0.91 0.83 0.94 0.87 0.76 75
0 0.42 0.83 0.91 0.56 0.87 0.75 6
avg / total 0.94 0.90 0.84 0.92 0.87 0.76 81
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 68 │ 7 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 5 │ ╘════════════╧═════════════════╧═════════════════╛
Before this change
73 bugs have no steps to reproduce 2642 bugs have steps to reproduce X: (2715, 68671), y: (2715,) Cross Validation scores: Accuracy: f0.6966856048676635 (+/- 0.09750364886404748) Precision: f0.9890742501710503 (+/- 0.01088263716211331) Recall: f0.6965130473241927 (+/- 0.10856018472515121) X_train: (128, 68671), y_train: (128,) X_test: (272, 68671), y_test: (272,) Test Set scores: No confidence threshold - 272 classified pre rec spe f1 geo iba sup
1 0.97 0.70 0.44 0.82 0.56 0.32 263
0 0.05 0.44 0.70 0.09 0.56 0.30 9
avg / total 0.94 0.69 0.45 0.79 0.56 0.32 272
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 185 │ 78 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 5 │ 4 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.6 - 228 classified pre rec spe f1 geo iba sup
1 0.98 0.72 0.43 0.83 0.56 0.32 221
0 0.05 0.43 0.72 0.08 0.56 0.30 7
avg / total 0.95 0.71 0.44 0.81 0.56 0.32 228
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 159 │ 62 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 4 │ 3 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.7 - 178 classified pre rec spe f1 geo iba sup
1 0.97 0.78 0.33 0.87 0.51 0.27 172
0 0.05 0.33 0.78 0.09 0.51 0.25 6
avg / total 0.94 0.77 0.35 0.84 0.51 0.27 178
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 135 │ 37 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 4 │ 2 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.8 - 119 classified pre rec spe f1 geo iba sup
1 0.98 0.83 0.50 0.90 0.64 0.43 115
0 0.09 0.50 0.83 0.15 0.64 0.40 4
avg / total 0.95 0.82 0.51 0.87 0.64 0.43 119
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 95 │ 20 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 2 │ 2 │ ╘════════════╧═════════════════╧═════════════════╛
Confidence threshold > 0.9 - 68 classified pre rec spe f1 geo iba sup
1 0.98 0.88 0.50 0.93 0.66 0.46 66
0 0.11 0.50 0.88 0.18 0.66 0.42 2
avg / total 0.96 0.87 0.51 0.91 0.66 0.46 68
╒════════════╤═════════════════╤═════════════════╕ │ │ 1 (Predicted) │ 0 (Predicted) │ ╞════════════╪═════════════════╪═════════════════╡ │ 1 (Actual) │ 58 │ 8 │ ├────────────┼─────────────────┼─────────────────┤ │ 0 (Actual) │ 1 │ 1 │ ╘════════════╧═════════════════╧═════════════════╛
Could you show here the confusion matrices too?
Codecov Report
Merging #817 into master will increase coverage by
0.69%. The diff coverage is100%.
@@ Coverage Diff @@
## master #817 +/- ##
==========================================
+ Coverage 52.83% 53.52% +0.69%
==========================================
Files 75 75
Lines 5127 5085 -42
==========================================
+ Hits 2709 2722 +13
+ Misses 2418 2363 -55
| Impacted Files | Coverage Δ | |
|---|---|---|
| bugbug/models/stepstoreproduce.py | 67.39% <100%> (+1.48%) |
:arrow_up: |
| scripts/regressor_finder.py | 0% <0%> (ø) |
:arrow_up: |
| bugbug/similarity.py | 0% <0%> (ø) |
:arrow_up: |
| scripts/evaluate_similarity.py | 0% <0%> (ø) |
:arrow_up: |
| scripts/commit_classifier.py | 0% <0%> (ø) |
:arrow_up: |
| scripts/microannotate_generator.py | 0% <0%> (ø) |
:arrow_up: |
| bugbug/repository.py | 74.73% <0%> (+0.35%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 001fba1...f184858. Read the comment docs.
Could you show here the confusion matrices too?
The before ones too.
BTW, there isn't much change, but this could make a difference when we have PULearning, so I'll keep it open and we'll re-evaluate after that.
Could you show here the confusion matrices too?
The
beforeones too.
Done