get_bugbug_labels no longer adds nobug type to regression training data
#539
Modified get_bugbug_labels in defect.py to include only those data points that are labelled either regression or bug_no_regression in the training set.
Training the model without changes
72486 non-regression bugs
Cross Validation scores: Accuracy: f0.9731263445549161 (+/- 0.0012810455820845609) Precision: f0.9560802008310938 (+/- 0.006503421458310747) Recall: f0.9316432362619518 (+/- 0.0042866900183067425)
Training the model after changes
71597 non-regression bugs (889 dropped)
Cross Validation scores: Accuracy: f0.9739072259525028 (+/- 0.0019480324611321944) Precision: f0.9561803892880535 (+/- 0.006928496874119621) Recall: f0.9358629670750973 (+/- 0.0045683573571298)
Minor improvement in precision and recall.
Should categories task, enhancement, feature also be removed from the training data for regression?
Please let me know if I have misunderstood the task.
@avinashselvam this is part of the request from #539. The other part is not to consider bugs with type "enhancement" or "task" as label 0 for the regression model.
@avinashselvam are you still interested in working on this? If so, I will be glad to help.