dubai-satellite-imagery-segmentation
dubai-satellite-imagery-segmentation copied to clipboard
Multi-Class Semantic Segmentation on Dubai's Satellite Images.
Dubai Satellite Imagery Semantic Segmentation Using Deep Learning
Abstract
Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. In this project, I have performed semantic segmentation on Dubai's Satellite Imagery Dataset by using transfer learning on a InceptionResNetV2 encoder based UNet CNN model. In order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation on the training set. The model has achieved ~81% dice coefficient and ~86% accuracy on the validation set.
Tech Stack
The Jupyter Notebook can be accessed from here.
The pre-trained model weights can be accessed from here.
Dataset
Humans in the Loop has published an open access dataset annotated for a joint project with the Mohammed Bin Rashid Space Center in Dubai, the UAE. The dataset consists of aerial imagery of Dubai obtained by MBRSC satellites and annotated with pixel-wise semantic segmentation in 6 classes. The images were segmented by the trainees of the Roia Foundation in Syria.
Semantic Annotation
The images are densely labeled and contain the following 6 classes:
Name | R | G | B | Color |
---|---|---|---|---|
Building | 60 | 16 | 152 | |
Land | 132 | 41 | 246 | |
Road | 110 | 193 | 228 | |
Vegetation | 254 | 221 | 58 | |
Water | 226 | 169 | 41 | |
Unlabeled | 155 | 155 | 155 |
Sample Images & Masks
Technical Approach
Data Augmentation using Albumentations Library
Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.
There are only 72 images (having different resolutions) in the dataset, out of which I have used 56 images (~78%) for training set and remaining 16 images (~22%) for validation set. It is a very small amount of data, in order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation. By doing so I have increased the training data upto 9 times. So, the total number of images in the training set is 504 (56+448), and 16 (original) images in the validation set, after data augmentation.
Data augmentation is done by the following techniques:
- Random Cropping
- Horizontal Flipping
- Vertical Flipping
- Rotation
- Random Brightness & Contrast
- Contrast Limited Adaptive Histogram Equalization (CLAHE)
- Grid Distortion
- Optical Distortion
Here are some sample augmented images and masks from the dataset:
InceptionResNetV2 Encoder based UNet Model
InceptionResNetV2 Architecture
Source: https://arxiv.org/pdf/1602.07261v2.pdf
UNet Architecture
Source: https://arxiv.org/pdf/1505.04597.pdf
InceptionResNetV2-UNet Architecture
-
InceptionResNetV2 model pre-trained on the ImageNet dataset has been used as an encoder network.
-
A decoder network has been extended from the last layer of the pre-trained model, and it is concatenated to the consecutive layers.
A detailed layout of the model is available here.
Hyper-Parameters
- Batch Size = 16.0
- Steps per Epoch = 32.0
- Validation Steps = 4.0
- Input Shape = (512, 512, 3)
- Initial Learning Rate = 0.0001 (with Exponential Decay LearningRateScheduler callback)
- Number of Epochs = 45 (with ModelCheckpoint & EarlyStopping callback)
Results
Training Results
Model | Epochs | Train Dice Coefficient | Train Accuracy | Train Loss | Val Dice Coefficient | Val Accuracy | Val Loss |
---|---|---|---|---|---|---|---|
InceptionResNetV2-UNet | 45 (best at 34th epoch) | 0.8525 | 0.9152 | 0.2561 | 0.8112 | 0.8573 | 0.4268 |
The model_training.csv
file contain epoch wise training details of the model.
Visual Results
Predictions on Validation Set Images:
All predictions on the validation set are available in the predictions
directory.
Activations (Outputs) Visualization
Activations/Outputs of some layers of the model:
|
|
|
|
---|---|---|---|
|
|
|
|
|
|
|
|
Some more activation maps are available in the activations
directory.
Code for visualizing activations is in the get_activations.py
file.
References
- Dataset- https://humansintheloop.org/resources/datasets/semantic-segmentation-dataset/
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv.org, 23-Aug-2016. [Online]. Available: https://arxiv.org/abs/1602.07261.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” arXiv.org, 18-May-2015. [Online]. Available: https://arxiv.org/abs/1505.04597.