Spam-detection
Spam-detection copied to clipboard
Email Spam-detection is an ANN app with TensorFlow. The idea is simple - given an email you’ve never seen before, determine whether or not that email is Spam
Spam-detection
Implementation of ANN for email spam-detection using TensorFlow
The idea is simple - given an email you’ve never seen before, determine whether or not that email is Spam or not.
It is simple ,but very efficient as I reached 99.6% accuracy .
The code is tested on python 2.7.11 and should work on python 2.x
Files description:
The data provided is from a kaggle competition
TR.tar.gzFILES contains 2500 mails both in Ham(1721) labelled as 1 and Spam(779) labelled as 0spam-mail.tr.labelis the associated training labels.ExtractContent.pyextract the subject and body of the email.
In a python compatible environment,
1, invoke the script by command
./ExtractContent.py
2, input source directory -- where you store the source files
For exmaple C:\EMAILPro\CSDMC2010_SPAM\TEST
3, input destination directory -- where you want the extracted body to be
For example C:\EMAILPro\CSDMC2010_SPAM\TEST_NEW
4, we are done.
email_input.pyvectorize the emails text,and outputs trainX.csv, trainY.csv, testX.csv, and testY.csvdata.tar.gzcontains trainX.csv, trainY.csv, testX.csv, and testY.csvBagOfWords.pcontains all unique words from the data to use it laterSpam detection.ipynbIpython notebook that train the model and call emails from ur Gmails to classify
The email format description:
The format of the .eml file is definde in RFC822, and information on recent standard of email, i.e., MIME (Multipurpose Internet Mail Extensions) can be find in RFC2045-2049.
NOTE:
In the notebook U will find how the model works , and how to authenticate ur Gmail