bugbug Investigate bad SpamBug classifications

E.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1615701 was classified as spam with high confidence, but it isn't spam. The first step is to check the feature importances for this classification, so we can see why the model chose the way it did.

Feb 18 '20 12:02 marco-c

Can I work on this?

Mar 04 '20 12:03 gabbyprecious

Sure, feel free. @ayush-1506 might be interested to investigate too in order to complete the spambug-related project, but this issue can easily be worked on by multiple people since it's an investigation issue.

Mar 05 '20 10:03 marco-c

I tried generating the confidence scores this weekend but the terminal crashes with Killed: 9 before printing the confidence. (Which seems to be oom error). I then tried reducing the size of the dataset, but turns out the reduced dataset had only one label every time I tried, so that couldn't work too. Will look into the alternatives.

Mar 05 '20 11:03 ayush-1506

Probably best to reduce the features considered by the model (the best bet is using a min_df for text_vectorizer like we do in other models, you can try with a few different values to see which one doesn't decrease accuracy).

Mar 05 '20 12:03 marco-c

This one appears to be open, Is it? @marco-c If yes I would like to work. Could you help me pinpoint it in the code base

Oct 04 '20 14:10 Nikhil1O1

Hi, I am new to the community would like to understand and learn to solve the issue. Can someone walk me through it?

Dec 23 '20 10:12 tarunjarvis5

This issue is not super-easy, as it requires some investigation and understanding of how the project works. You can start with this one if you want, but you'll need to play around with the code a bit. You can start by training the spambug model (run the scripts/trainer.py script) and classifying a bug with the spambug model (run the scripts/bug_classifier.py script). Then look at the source code of those scripts and try to understand how they work.

Dec 23 '20 14:12 marco-c

I'll like to work on this issue. Permit me to ask you any question if I get stuck or find anything strange.

Mar 08 '23 18:03 husainridwan

I'll like to work on this issue.

You can work on any issue without asking, as long as there is no open PR linked to it (see #1092).

Permit me to ask you any question if I get stuck or find anything strange.

You are welcome!

Mar 08 '23 18:03 suhaibmujahid

bugbug bugbug copied to clipboard

Investigate bad SpamBug classifications

bugbug
bugbug copied to clipboard