Feat: Spam Detection Feature
Attempts to close https://github.com/comses/planning/issues/113
Squashed commits and solved merge conflicts.
Summary
Management commands for Machine Learning spam detection.
Features
Before running the commands, make sure spam_dataset.csv is located in the curator folder. spam_dataset.csv consist of user_id and is_spam columns. is_spam column contains 1(Spam) or 0(Ham).
-
XGBoostClassifier() ... Uses XGboost as a classifier. Takes a data frame that has columns "user_id" and "input_data." The "input_data" column is a numerical vector where the selected fields are encoded by an encoder.
-
CountVectEncoder() ... Uses CountVectorizer as an encoder. Takes selected fields from "user_id," "labelled_by_curator," "first_name," "last_name," "is_active," "email," "affiliations," "bio," "research_interests" of the MemberProfiles as input.
-
Run the following command to get a list of spam users.
./manage.py curator_spam_detection --predict -
options
./manage.py curator_spam_detection --fit./manage.py curator_spam_detection --get_model_metrics./manage.py curator_spam_detection --load_labels
Tests
Wrote 16 unit tests using Django tests