bugbug
bugbug copied to clipboard
Investigate bad SpamBug classifications
E.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1615701 was classified as spam with high confidence, but it isn't spam. The first step is to check the feature importances for this classification, so we can see why the model chose the way it did.
Can I work on this?
Sure, feel free. @ayush-1506 might be interested to investigate too in order to complete the spambug-related project, but this issue can easily be worked on by multiple people since it's an investigation issue.
I tried generating the confidence scores this weekend but the terminal crashes with Killed: 9
before printing the confidence. (Which seems to be oom error). I then tried reducing the size of the dataset, but turns out the reduced dataset had only one label every time I tried, so that couldn't work too. Will look into the alternatives.
Probably best to reduce the features considered by the model (the best bet is using a min_df
for text_vectorizer like we do in other models, you can try with a few different values to see which one doesn't decrease accuracy).
This one appears to be open, Is it? @marco-c If yes I would like to work. Could you help me pinpoint it in the code base
Hi, I am new to the community would like to understand and learn to solve the issue. Can someone walk me through it?
This issue is not super-easy, as it requires some investigation and understanding of how the project works. You can start with this one if you want, but you'll need to play around with the code a bit. You can start by training the spambug model (run the scripts/trainer.py script) and classifying a bug with the spambug model (run the scripts/bug_classifier.py script). Then look at the source code of those scripts and try to understand how they work.
I'll like to work on this issue. Permit me to ask you any question if I get stuck or find anything strange.
I'll like to work on this issue.
You can work on any issue without asking, as long as there is no open PR linked to it (see #1092).
Permit me to ask you any question if I get stuck or find anything strange.
You are welcome!