deeplab-training
deeplab-training copied to clipboard
Training your own Deeplab Model in Tensorflow
Training Deeplab on Your Own Dataset
TLDR: This tutorial covers how to set up Deeplab within Tensorflow to train your own machine learning model, with a focus on separating humans from the background of a photograph in order to perform background replacement.
If you'd rather watch this on Youtube, see the deeplab training tutorial here, and the openCV visualization / background swapping tutorial here
There are 3 parts to the tutorial. Feel free to skip to the section that is most relevant to you.
- Part 1 focuses on collecting a dataset.
- Part 2 focuses on training Deeplab on your dataset
- Part 3 focuses on visualizing the results of the training, and performing background replacement using openCV.
Installation Process
Create a Python3 Environment with Pip
- With anaconda navigator (or conda) / pyenv / virtualenv, create an environment with python 3.7.4, and activate it:
/Users/[username]/.anaconda/navigator/a.tool ; exit;
- Anaconda has pip preinstalled. If not using anaconda, make sure you have pip installed. Follow these directions on a mac.
Clone the Deeplab Models Github Repo
Clone the official tensorflow models repo
You will only need the models/research/deeplab and models/research/slim directories. You can delete everything else.
Merge the files from the tutorial repo into the tensorflow models repo
Clone or download this repo, and put everything into the directory you just created for the tensorflow models repo. but don't overwrite anything *except the input_preprocess.py file in the /deeplab/ directory, which has a small change.
For example put models/research/eval-pqr.sh into the tensorflow models/research directory.
Install Tensorflow
- From the
models/research/directory, install tensorflow:
pip3 install --upgrade pip #need version 19 or higher
pip3 install tensorflow==1.15 #I had issues with tensorflow 2 on a mac
If you have a CUDA-compatible GPU, You can use tensorflow-gpu instead of tensorflow.
- Install Pillow - this library helps you process images (Python Image Library)
pip3 install Pillow #use this for a mac. Other systems or versions of python might use PIL
- Install other dependencies:
pip3 install tqdm numpy
more help on installing tensorflow here.
Make sure to follow the steps in the link to ensure that you can run model_test.py:
python3 deeplab/model_test.py
Pay special attention to this step:
# From tensorflow/models/research/
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
This command has to be run each time you activate your python environment or open the terminal window:
And also, make sure, especially if you are running multiple python environments, that you always use python3 and pip3 for every command you run (instead of python and pip). This will save you lots of headaches.
Image Preparation Process
Notes
-
Run all
pythoncommands withpython3 -
Run all
pipcommands withpip3 -
Must run :
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
for each terminal session
Dependencies
Python V3.7.4
Tensorflow 1.15
Making a dataset
You will need a consistent background image, and a large set of transparent (or masked) foreground images with photos of people. You'll want to composite each foreground image on to the background.
Make sure the background image is representative of the background image you will be using for real time photo replacement.
Make sure the foreground images represent the diversity of photos you will likely expect in a live scenario. For best results, consider things like:
- Proximity to camera
- Number of people in photos
- Race and gender
- Clothing styles (loose or tight, patterned, dark, light)
- Proximity to each other (touching, hugging, far apart both depthwise and horizontally)
- Poses (sideways, front facing, smiling, acting, etc)
- Props (people holding things, wearing hats or masks, etc)
- Lighting conditions (e.g. high contrast or shadowy, multiple light sources, indoor, outdoor)
Scraping Images
The utilities/scrapeImages.py file is useful in downloading images from google.
NOTE: this search does not limit search results to freely licensed files - it was only used for my internal testing, and you should be careful not to utilize any scraped images from any website without ensuring that you are adhering to their licensing and use guidelines.
You should first edit the scrapeImages.py file to use your desired query string. Look for:
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch&tbs=isz:m,itp:photo,ic:trans,ift:png"
The tbs= param in this case does the following:
ism:mdownloads medium sized imagesitp:photo- downloads only photo type imagesic:transdownloads only images with transparencyift:pngdownloads only images of file type transparency.
To use your own parameters, do an advanced google search for the type of images you want, and take a look at the query string in the URL bar of your browser for what tbs parameters it generates for you, and replace them here.
You then run the scraper as follows:
python3 utilities/scrapeImages.py --search "[your_search_term]" --num_images 100 --directory "/[Path]/[to]/[your]/[image]/[folder]"
Changing [your_search_term] and the value of the --directory flag to where you want to save images to.
Creating Segmentation Images
You will need to create a new set of images that merges each transparent foreground images on to the consistent background.
You will also need to create a new set of images where the background is black, and the transparent foreground image matches the color you are trying to segment, in this case "Person" which is color rgb(192,128,128).
Both sets of images, the "regular" and "segmentation" images should have the same size, and match each other exactly in terms of the position and scale of the foreground subjects in relation to the background. See this example:
Regular Image

Segmentation Image

The photoshop actions section below has a set of useful actions for accomplishing this properly in photoshop.
You should make a directory within the models/deeplab/datasets directory. Call it whatever you like (in this case, we used PQR).
Within that folder, make another folder called JPEGImages and place all the "regular" images.
Photoshop Actions
If you know how to use photoshop actions, this repo contains a set of actions that will help convert and merge your photos. Go to window > actions in photoshop and choose load actions and load the glowbox.atn file.
To run these actions in batch, you'll want to go to File > Automate > Batch in photoshop, and select the desired action, the folder location of your foreground images. Destination should be None as the action contains a save command itself.
To edit any of the actions, you'll want to select the step of the action from the actions panel and double click it, modifying the desired parameters.
place_and_save:
This action:
- takes a loaded image
- resizes the canvas to the size of your background image and places that image as a layer
- moves your foreground image to the bottom center of the photo
- exports the image as a 60% quality jpeg.
Make sure to edit the action to specify the location of your background image, the canvas size matching your background image's size, and the desired export quality. Also, make sure the export location does not overwrite your transparent foreground images, you'll need those to create your segmentation masks.
segment:
This action:
- Takes the transparent images and resizes the canvas to match the dimensions of your chosen background image.
- Aligns the transparent foreground image to the bottom center of the canvas.
- Makes a background and fills it with black (the segmentation color for
backgroundin deeplab) - Makes a selection of the foreground and fills it with the proper color for the
Personsegmentation in deeplab:RGB(192,128,128) - Exports the image as a 60% quality jpeg.
Make sure to edit the action to specify the desired color segmentation for your images, if you are not trying to identify people in your photos. You can see the deeplab (resnet) color segmentation scheme here.
convert_to_indexed- You do not have to run this action if you used thesegmentaction above.
However, if you already have images, this action just ensures that the color for the segmentation mask is exact, forcing a pink-ish color to the exact pixel values. Photoshop, for example, does some adjustment of colors on a normal save to match your screen's color profile. You can prevent having to run this action at all if saving from photoshop by ensuring that the convert to sRGB option in the save for web dialog is unchecked.
-
merge_segmentation- this action does not need to be run until after all of your model training and image generation has been done. Essentially it is the last step, helping you to visualize how well your machine-learning generated masks actually mask off your subject. It does the following: -
Adds both the regular image and the segmentation masks as layers
-
Selects the color range matching the segmentation layer
-
Makes a mask around the regular image
You end up with 3 layers - one with the untouched photo, one with the segmentation mask, and one with your regular image masked off to show how well the background was removed and the subjects were isolated.
Convert your RGB segmentation images to indexed colors
In order to reduce the number of dimensions of processing deeplab has to do on each image, we will be converting each found RGB color in the segmentation images you made (i.e. RGB(192,128,128)) to an indexed color value (i.e. 1). This will make processing a lot faster.
This repo includes a file in the deeplab/datasets/ directory called convert_rgb_to_index.py which will help you accomplish that.
Before running, make sure to edit the following:
# palette (color map) describes the (R, G, B): Label pair
palette = {(0, 0, 0) : 0 , #background
(192, 128, 128) : 1 #person
}
If you are not processing people, the palette should contain all of the segmentation colors you are trying to detect. In our case, since we are just looking for people, the palette contains black for the background as index 0, and pink for the foreground as index 1.
label_dir: this is the path (relative to the datasets directory where this file is contained) where your Segmentation Class images were saved. Make sure to change it if your file locations differ.
new_label_dir: this is the path where your newly generated images will be saved. You do not need to make this directory, it will be generated for you.
To run the script, from the datasets directory, run: python3 convert_rgb_to_index.py. You will need to make sure all of this files dependencies are installed via pip:
pip3 install Pillow tqdm numpy
Once it runs, you should have a new folder SegmentationClassRaw (or whatever you called the new_label_dir folder). It should contain a list of .png images. They will all look black. This is normal. We converted the RGB values into single index values, so a standard image viewer won't understand this format.
Make a list of all your training and test images
Make another folder at the same level as JPEGImages called SegmentationClass (see the folder structure section below for the a better sense of the entire folder structure you will be adding to deeplab). This folder will contain all your segmentation images.
Deciding how to divide up your train and validation set is up to you. Ideally you have at least 500 training images, and at least 100 test images. A good starting split might be a 10:1 ratio of training to test images.
Generate the tfrecord folder
Tensorflow has a tfrecord format that makes storing training data much more efficient. We will need to generate this folder for our dataset. To do so, this repo has made a copy of the build_voc2012_data.py file which has been saved as a new file, (in our case build_pqr_data.py).
Edit the build_pqr_data.py file, and make sure there is a flag for our model's desired folders. In this case, look at ~line80:
tf.app.flags.DEFINE_string('image_folder',
'./PQR/JPEGImages',
'Folder containing images.')
tf.app.flags.DEFINE_string(
'semantic_segmentation_folder',
'./PQR/SegmentationClassRaw',
'Folder containing semantic segmentation annotations.')
tf.app.flags.DEFINE_string(
'list_folder',
'./PQR/ImageSets',
'Folder containing lists for training and validation')
tf.app.flags.DEFINE_string(
'output_dir',
'./PQR/tfrecord',
'Path to save converted SSTable of TensorFlow examples.')
Make sure to change any of those directories to match where your files are located. In this instance, the tfrecord folder should exist. The script will not make it for you. Also note that at around Line 119 I have hardcoded the input format to be `.jpg:
image_filename = os.path.join(
#MH:
#FLAGS.image_folder, filenames[i] + '.' + FLAGS.image_format)
FLAGS.image_folder, filenames[i] + '.jpg')
#END MH
and the output images to be .png
#MH:
#filenames[i] + '.' + FLAGS.label_format)
filenames[i] + '.png')
#END MH
due to an issue I had with the script utilizing the label_format flag. You should change those extensions to match the extensions of your own images if they differ.
Now you can run the file (from the datasets directory:
python3 build_pqr_data.py
Once this is done, you will have a tfrecord directory filled with .tfrecord files.
Add the information about your dataset segmentation (TODO: check to make sure we still need this step...)
You'll need to provide tensorflow the list of how your dataset was divided up into training and test images.
In deprecated/segmentation_dataset.py , look for the following (~Line 114):
# MH
_PQR_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 487,
'val': 101,
'trainval': 588,
},
num_classes=2,
ignore_label=255,
)
_DATASETS_INFORMATION = {
'cityscapes': _CITYSCAPES_INFORMATION,
'pascal_voc_seg': _PASCAL_VOC_SEG_INFORMATION,
'ade20k': _ADE20K_INFORMATION,
'pqr': _PQR_INFORMATION,
}
# END MH
These splits should match the number of files in your training and test sets that you made earlier. For example, if train.txt has 487 line numbers, train is 487. Same with val and trainval. If you are trying to segment more than just the background and foreground, num_classes should match the number of segmentations you are targeting. ignore_label=255 just means you are ignoring anything in the segmentation that is white (used in some segmentations to create a clear space division between multiple segmentations).
Note that _DATASETS_INFORMATION also contains a reference to this new dataset descriptor we've added:
'pqr': _PQR_INFORMATION
You're finally ready to train!
Training Process
Folder Structure
Make sure your folder structure from /datasets looks similar to this, if you followed all of the naming conventions in the above steps:
+ PQR
+ exp //contains exported files
+ train_on_trainval_set
+ eval //contains results of training evaluation
+ init_models //contains the deeplab pascal training set, which you need to download
+ train //contains training ckpt files
+ vis
+ segmentation_results //contains the generated segmentation masks
+ Imagesets
train.txt
trainval.txt
val.txt
+ logs
+ tfrecord //holds your converted dataset
buid_pqr_data.py //creates your tfrecord files
convert_rgb_to_index.py //turns rgb images into their segmentation indices
../../train-pqr.sh //holds the training script
../../eval-pqr.sh //holds the eval script
../../vis-pqr.sh //holds the visualization script
Download the Pascal Training Set
In order to make our training much faster we'll want to use a pre-trained model, in this case pascal VOC2012. You can download it here. Extract it into the PQR/exp/train_on_tranval_set/init_models directory (should be named deeplabv3_pascal_train_aug).
Edit your training script
First, edit your train-pqr.sh script (in the models/research) directory:
# Set up the working environment.
CURRENT_DIR=$(pwd)
WORK_DIR="${CURRENT_DIR}/deeplab"
DATASET_DIR="datasets"
# Set up the working directories.
PQR_FOLDER="PQR"
EXP_FOLDER="exp/train_on_trainval_set"
INIT_FOLDER="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/init_models"
TRAIN_LOGDIR="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/${EXP_FOLDER}/train"
DATASET="${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/tfrecord"
mkdir -p "${WORK_DIR}/${DATASET_DIR}/${PQR_FOLDER}/exp"
mkdir -p "${TRAIN_LOGDIR}"
NUM_ITERATIONS=9000
python3 "${WORK_DIR}"/train.py \
--logtostderr \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=1000,667 \
--train_batch_size=4 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="${INIT_FOLDER}/deeplabv3_pascal_train_aug/model.ckpt" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${DATASET}"
Things you may want to change:
- Make sure all paths are correct (starting from th
models/researchfolder asCURRENT_DIR) NUM_ITERATIONS- this is how long you want to train for. For me, on a Macbook Pro without GPU support, it took about 12 hours just to run 1000 iterations. You can expect GPU support to speed that up about 10X. At 1000 iterations, I still had a loss of about.17. I would recommend at least 3000 iterations. Some models can be as high as about 20000. You don't want to overtrain, but you're better off over-training than under-training.train_cropsize- this is the size of the images you are training on. Your training will go much faster on smaller images. 1000x667 is quite large and I'd have done better to reduce that size a bit before training. Also, you should make sure these dimensions match in all three scripts:train-pqr,eval-pqr, andvis-pqr.py.- The checkpoint files (
.ckpt) are stored in yourPQR_FOLDERand can be quite large (mine were 330 MB per file). However, periodically (in this case every 4 checkpoint files), the oldest checkpoint file will be deleted and the new one added - this should keep your harddrive from filling up too much. But in general, make sure you have plenty of harddrive space.
Start training:
You are finally ready to start training!
From the models/research directory, run sh train-pqr.sh
If you've set everything up properly, your machine should start training! This will take.a.long.time. You should be seeing something like this in your terminal:

Evaluation
Running eval-pqr.sh from the same directory will calculate the mean intersection over union score for your model. Essentially, this will tell you the number of pixels in common between the actual mask and the prediction of your model:

In my case, I got a score of ~.87 - which means essentially 87% of the pixels in my prediction mask were found in my target mask. The higher the number here, the better the mask.
Visualization
To visualize the actual output of your masks, run vis-pqr.sh from the models/research directory. These will output to your visualization directory you specified (in our case, models/research/deeplab/datasets/PQR/exp/train_on_trainval_set/vis/segmentation_results). You will see two separate images for each visualization: the "regular" image, and the "prediction" (or segmentation mask).
If you want to combine these two images, the merge_segmentation photoshop action can help.
I've also set this up as an automated process in openCV to take an image and its segmentation mask and automatically substitute in a background of your choosing.
Using OpenCV for background replacement
Install OpenCV
Follow these directions to install opencv on mac - but use version 4.1.2 instead of 4.0:
wget -O opencv.zip https://github.com/opencv/opencv/archive/4.1.2.zip
$ wget -O opencv_contrib.zip https://github.com/opencv/opencv_contrib/archive/4.1.2.zip
Give your virtual environment a name of cv, then workon cv.
Rename /usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.cpython-37m-darwin.so to cv2.so
then cd /Users/[your_username]/.virtualenvs/cv/lib/python3.7/site-packages
then ln -s /usr/local/lib/python3.7/site-packages/cv2/python-3.7/cv2.so cv2.so
The cv Python virtual environment is entirely independent and sequestered from the default Python version on your system. Any Python packages in the global directories will not be available to the cv virtual environment. Similarly, any Python packages installed in site-packages of cv will not be available to the global install of Python.
Directory Structure
Navigate to the cv directory. You should have the following directory structure:
+input
+output
+masks
+bg
replacebg_dd.py
/input- contains the images whose background you want to replace/masks- contains the segmentation masks that will separate the foreground from the background (people from everything else)./output- where the photos with the replaced background will be saved/bgcontains the background image that will be used as the replacement.replacebg_dd.py- the python script that utilizes opencv to handle background replacement.
Note: all files in the input and masks directories should have the same names to ensure they match up together when running the script
Using the replacebg.py script:
Before calling the script, check the following lines within the script:
input_dir = 'input/'
output_dir = 'output/'
mask_dir = 'masks/'
bg_dir = 'bg/'
bg_file = 'track.jpg'
These directories should match your directories relative to the replacebg.py script.
initial_threshold_val = 150 : Changing this value will change the black / white value above which the foreground is kept rather than the background.
Script Options
The python script is responsible for handling what pixels to keep from the source vs which to throw away, and can do some basic thresholding and blurring of the mask image to attempt to improve results.
There are a few parameters you can pass the replacebg.py script:
--image(i.e.replacebg.py --image 36) would show (but not save) the image numbered 36--generate(i.e.replacebg.py --generate 20) would save out the first 20 images--all(replacebg.py --all) would save out all images (provided you manually keep thenum_inputsvariable synched with however many files you have in your input directory)replacebg.py --start 20would generate images between the 20th andnum_inputsphotos.replacebg.py --start 20 --end 30would generate images between the 20th and 30th photos in the directory
Keyboard commands
When you run the script and it is displaying an image, you can use the following keyboard commands:
zincreases the threshold, tightening up on the subjects and revealing more of the substituted backgroundxdecreases the threshold, showing more of the source photossaves the image outqquits the window and script executionicycles to the next image in the sequence
NOTE:
This tutorial and repo were created through my difficulties installing and training deeplab, in the hopes that it would make things easier for others trying to do the same. Very little of the code is my own, and has been assembled from a variety of sources - all of which were extremely helpful, but none of which I was able to follow on their own in order to successfully train Deeplab. By combining various pieces of the following links, I was able to create a process that worked smoothly for me.
Links:
Analytics Vidhya - Semantic Segmentation: Introduction to the Deep Learning Technique Behind Google Pixel’s Camera!, Saurabh Pal
Installing Tensorflow - Official Documentation
Installing Deeplab - Official Documentation
Tensorflow-Deeplab-Resnet - Dr. Sleep
Free Code Camp - How to use DeepLab in TensorFlow for object segmentation using Deep Learning, Beeren Sahu
Dataset Utils - Gene Kogan - useful in scraping images for a dataset and creating randomly sized, scaled, and flipped images in order to increase the training set size.