Dubai Satellite Imagery Semantic Segmentation Using Deep Learning

Abstract

Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. In this project, I have performed semantic segmentation on Dubai's Satellite Imagery Dataset by using transfer learning on a InceptionResNetV2 encoder based UNet CNN model. In order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation on the training set. The model has achieved ~81% dice coefficient and ~86% accuracy on the validation set.

Tech Stack

The Jupyter Notebook can be accessed from here.

The pre-trained model weights can be accessed from here.

Dataset

Humans in the Loop has published an open access dataset annotated for a joint project with the Mohammed Bin Rashid Space Center in Dubai, the UAE. The dataset consists of aerial imagery of Dubai obtained by MBRSC satellites and annotated with pixel-wise semantic segmentation in 6 classes. The images were segmented by the trainees of the Roia Foundation in Syria.

Semantic Annotation

The images are densely labeled and contain the following 6 classes:

Name	R	G	B
Building	60	16	152
Land	132	41	246
Road	110	193	228
Vegetation	254	221	58
Water	226	169	41
Unlabeled	155	155	155

Sample Images & Masks

Technical Approach

Data Augmentation using Albumentations Library

Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.

There are only 72 images (having different resolutions) in the dataset, out of which I have used 56 images (~78%) for training set and remaining 16 images (~22%) for validation set. It is a very small amount of data, in order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation. By doing so I have increased the training data upto 9 times. So, the total number of images in the training set is 504 (56+448), and 16 (original) images in the validation set, after data augmentation.

Data augmentation is done by the following techniques:

Random Cropping
Horizontal Flipping
Vertical Flipping
Rotation
Random Brightness & Contrast
Contrast Limited Adaptive Histogram Equalization (CLAHE)
Grid Distortion
Optical Distortion

Here are some sample augmented images and masks from the dataset:

InceptionResNetV2 Encoder based UNet Model

InceptionResNetV2 Architecture

Source: https://arxiv.org/pdf/1602.07261v2.pdf

UNet Architecture

Source: https://arxiv.org/pdf/1505.04597.pdf

InceptionResNetV2-UNet Architecture

InceptionResNetV2 model pre-trained on the ImageNet dataset has been used as an encoder network.
A decoder network has been extended from the last layer of the pre-trained model, and it is concatenated to the consecutive layers.

A detailed layout of the model is available here.

Hyper-Parameters

Batch Size = 16.0
Steps per Epoch = 32.0
Validation Steps = 4.0
Input Shape = (512, 512, 3)
Initial Learning Rate = 0.0001 (with Exponential Decay LearningRateScheduler callback)
Number of Epochs = 45 (with ModelCheckpoint & EarlyStopping callback)

Results

Training Results

Model	Epochs	Train Dice Coefficient	Train Accuracy	Train Loss	Val Dice Coefficient	Val Accuracy	Val Loss
InceptionResNetV2-UNet	45 (best at 34^th epoch)	0.8525	0.9152	0.2561	0.8112	0.8573	0.4268

The model_training.csv file contain epoch wise training details of the model.

Visual Results

Predictions on Validation Set Images:

All predictions on the validation set are available in the predictions directory.

Activations (Outputs) Visualization

Activations/Outputs of some layers of the model:

conv2d	conv2d_4	conv2d_8	conv2d_10
conv2d_22	conv2d_28	conv2d_29	conv2d_34
conv2d_35	conv2d_40	conv2d_61	conv2d_70

Some more activation maps are available in the activations directory.

Code for visualizing activations is in the get_activations.py file.

References

Dataset- https://humansintheloop.org/resources/datasets/semantic-segmentation-dataset/
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv.org, 23-Aug-2016. [Online]. Available: https://arxiv.org/abs/1602.07261.
O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” arXiv.org, 18-May-2015. [Online]. Available: https://arxiv.org/abs/1505.04597.

dubai-satellite-imagery-segmentation
dubai-satellite-imagery-segmentation copied to clipboard

Metadata

Dubai Satellite Imagery Semantic Segmentation Using Deep Learning

Abstract

Tech Stack