Feat: Spam Detection Feature

Open aimura09 opened this issue 1 year ago • 0 comments

Attempts to close https://github.com/comses/planning/issues/113

Squashed commits and solved merge conflicts.

Summary

Management commands for Machine Learning spam detection.

Features

Before running the commands, make sure spam_dataset.csv is located in the curator folder. spam_dataset.csv consist of user_id and is_spam columns. is_spam column contains 1(Spam) or 0(Ham).

XGBoostClassifier() ... Uses XGboost as a classifier. Takes a data frame that has columns "user_id" and "input_data." The "input_data" column is a numerical vector where the selected fields are encoded by an encoder.
CountVectEncoder() ... Uses CountVectorizer as an encoder. Takes selected fields from "user_id," "labelled_by_curator," "first_name," "last_name," "is_active," "email," "affiliations," "bio," "research_interests" of the MemberProfiles as input.

Run the following command to get a list of spam users. ./manage.py curator_spam_detection --predict
options ./manage.py curator_spam_detection --fit ./manage.py curator_spam_detection --get_model_metrics ./manage.py curator_spam_detection --load_labels

Tests

Wrote 16 unit tests using Django tests

Jan 12 '24 01:01 aimura09