DeepClassificationBot
DeepClassificationBot copied to clipboard
A deep learning powered bot capable of classifying images into user-specified categories
Classification Bot
Welcome to the Classification Bot codebase. Classification Bot is an attempt of simplifying the collection, extraction and preprocessing of data as well as providing an end to end pipeline for using them to train large deep neural networks.
The system is composed of scrapers, data extractors, preprocessors, deep neural network models using Keras provided by Francois Chollet and an easy to use deployment module.
Installation
Make sure you have a GPU as the training is very compute intensive
- (OSX) Install gcc:
brew install gcc - Install CUDA_toolkit 7.5
- Install cuDNN 4
- Install Theano, using
sudo pip install git+git://github.com/Theano/Theano.git - Install OpenCV
- Install hdf5 library (libhdf5-dev)
- Make sure you have Python 2.7.6 and virtualenv installed on your system
- Install Python dependencies
$ virtualenv --python=python2 --system-site-packages env
$ . env/bin/activate
$ pip install -r requirements.txt
Training and deploying
To download images
Use google_image_scraper.py to download images. It takes a .csv file of the categories you want, and downloads a number of images per line.
The first line of the .csv file will be ignored.
The number of images per category is configurable. We suggest a number between 200-1000:
$ google_image_scraper.py -n 200 yourfilehere.csv
Easy Mode:
(For users that have a list of categories available at hand):
- Create a .csv file with one category per line of what you want the scraper to search for.
- Now let's download some images! Run
python google_image_scraper.py yourfilehere.csv
Hacker Mode:
(For users that know an online repo that has their categories and want to fetch them, or if their categories are too many and you MUST automate the procedure, or if you much rather code stuff rather than copy and paste)
- Write a script that can fetch your categories using Wikipedia or any other resource you would like. For an example look at
examples/anime_names.pyto see what we used to get our categories. - Have your script create a .csv file with the categories you require.
- Then run
python google_image_scraper.py yourfilehere.csv
To extract and preprocess data ready for training
- Once you have your data ready, run
python train.py extract_datato get all of your data ready and saved in HDF5 files.
To train your network
- Once all of the above have been met then you are ready to train your network, by running
python train.py --runto load data from HDF5 files orpython train.py --run --extract_datato extract data and train in one procedure. - If you want to continue training a model, you can. After each epoch the weights are saved. If you want to continue training simply run
python train.py --run --continue
Deploying a model
- Once your training has finished and a good model has been trained then you can deploy your model.
- To deploy a model on a single URL image use
python deploy.py --URL [URL_LINK] - To deploy a model on a folder full of images use
python deploy --image-folder path/to/folder - To deploy a model on a single file use
python deploy --image-path path/to/file
Once deployed the model should return the top 5 predictions on each image in a nice string formatted view: e.g.
Image Name: Tengen.Toppa.Gurren-Lagann.full.174481.jpg
Categories:
0. Gurren Lagann: 0.999914288521
1. Kill La Kill: 7.29278544895e-05
2. Naruto: 4.92283288622e-06
3. Redline: 2.71744352176e-06
4. Cowboy Bebop: 1.41406655985e-06
_________________________________________________
Things for you to try
- Create your own classifiers
- Try different model architectures (Hint: go to google scholar or arxiv and search for GoogLeNet, VGG-Net, AlexNet, ResNet and follow the waves :) )
Twitter bot
deepanimebot/bot.py is a Twitter bot that provides an interface for querying the classifier.
Running the bot locally
Prerequisites
- A classifier
- A Twitter app registered under the bot account
- Consumer key and secret for that app
- Your access token and secret for that app
Copy bot.ini.example to bot.ini and overwrite with your consumer key/secret and access token/secret.
Run it
$ PYTHONPATH=. python deepanimebot/bot.py -c bot.ini --debug --classifier=local
python deepanimebot/bot.py --help will list all available command line options.
Web interface
deepanimebot/webapp.py is a Flask app for querying the classifier.
$ PYTHONPATH=. python deepanimebot/webapp.py
Deploying to Google Cloud Platform
This repo comes with the necessary support files for deploying the Twitter bot and/or the web app to Google Cloud Platform.
Prerequisites
- A classifier
- Twitter app credentials (see above)
- Docker tools and an account on a docker registry
- Google Cloud SDK
- A Google Cloud Platform project
Building and registering your own Docker image
classificationbot/base:latest comes with all the dependencies installed.
If you've modified the code and added a new dependency,
make a new Docker image based on the dockerfiles in this repo.
This repo's base images are built with these commands:
$ docker build -t classificationbot/base:latest -f dockerfiles/base/Dockerfile .
$ docker push classificationbot/base:latest
$ docker build -t classificationbot/ci:latest -f dockerfiles/ci/Dockerfile .
$ docker push classificationbot/ci:latest
Deploying
There are two options:
- (Not used anymore) Google Compute Engine, container-optimized instance, supervisord + tweepy: bot-standalone
- Google Container Engine, kubernetes, gunicorn + flask + tweepy: follow this gist
Special Thanks
Special thanks to Francois Chollet (fchollet) for building the superb Keras deep learning library. We couldn't have brought a project ready to be used by non-machine learning people if it wasn't for the ease of use of Keras.
Special thanks to https://github.com/shuvronewscred/ for building the image scraper we adapted for our project. Original source code can be found at https://github.com/shuvronewscred/google-search-image-downloader