bugbug icon indicating copy to clipboard operation
bugbug copied to clipboard

Investigate bad SpamBug classifications

Open marco-c opened this issue 5 years ago • 15 comments

E.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1615701 was classified as spam with high confidence, but it isn't spam. The first step is to check the feature importances for this classification, so we can see why the model chose the way it did.

marco-c avatar Feb 18 '20 12:02 marco-c

Can I work on this?

gabbyprecious avatar Mar 04 '20 12:03 gabbyprecious

Sure, feel free. @ayush-1506 might be interested to investigate too in order to complete the spambug-related project, but this issue can easily be worked on by multiple people since it's an investigation issue.

marco-c avatar Mar 05 '20 10:03 marco-c

I tried generating the confidence scores this weekend but the terminal crashes with Killed: 9 before printing the confidence. (Which seems to be oom error). I then tried reducing the size of the dataset, but turns out the reduced dataset had only one label every time I tried, so that couldn't work too. Will look into the alternatives.

ayush-1506 avatar Mar 05 '20 11:03 ayush-1506

Probably best to reduce the features considered by the model (the best bet is using a min_df for text_vectorizer like we do in other models, you can try with a few different values to see which one doesn't decrease accuracy).

marco-c avatar Mar 05 '20 12:03 marco-c

This one appears to be open, Is it? @marco-c If yes I would like to work. Could you help me pinpoint it in the code base

Nikhil1O1 avatar Oct 04 '20 14:10 Nikhil1O1

Hi, I am new to the community would like to understand and learn to solve the issue. Can someone walk me through it?

tarunjarvis5 avatar Dec 23 '20 10:12 tarunjarvis5

This issue is not super-easy, as it requires some investigation and understanding of how the project works. You can start with this one if you want, but you'll need to play around with the code a bit. You can start by training the spambug model (run the scripts/trainer.py script) and classifying a bug with the spambug model (run the scripts/bug_classifier.py script). Then look at the source code of those scripts and try to understand how they work.

marco-c avatar Dec 23 '20 14:12 marco-c

I'll like to work on this issue. Permit me to ask you any question if I get stuck or find anything strange.

husainridwan avatar Mar 08 '23 18:03 husainridwan

I'll like to work on this issue.

You can work on any issue without asking, as long as there is no open PR linked to it (see #1092).

Permit me to ask you any question if I get stuck or find anything strange.

You are welcome!

suhaibmujahid avatar Mar 08 '23 18:03 suhaibmujahid