vision copied to clipboard
Mosaic Transform
Part of #6323
@abhi-glitchhg Just checking with you to see if you got stack anywhere. :) Let me know if you face any issues.
Hey @datumbox, Thanks for checking on me! 🤗; I was a bit busy for some time.
I have gone through the mosaic implementation and have understood it;
I have a basic implementation locally. Hopefully, by this weekend, I will clean up and update this PR. Thanks, Abhijit :)
Still WIP
first of all, I apologize for the inactivity on this pr. I'll be more regular from now on.
I have used Pedestrian Dataset to check the implementation. Download the dataset
I have tested the implementation with following code; to create image tensor of shape B*4*C*H*W
I have used for loop, there might be some efficient way to do this.
import torch
from torchvision.prototype import transforms, datapoints
from torchvision.prototype.transforms import functional as F
from torchvision import utils
import os
import numpy as np
import torch
from PIL import Image
from references.detection.transforms import Mosaic
class PennFudanDataset(
def __init__(self, root, transforms ):
self.root = root
self.transforms= transforms
# load all image files, sorting them to
# ensure that they are aligned
self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))
def __getitem__(self, idx):
# load images and masks
img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
img ="RGB")
img = F.pil_to_tensor(img)
# note that we haven't converted the mask to RGB,
# because each color corresponds to a different instance
# with 0 being background
mask =
# convert the PIL Image into a numpy array
mask = np.array(mask)
# instances are encoded as different colors
obj_ids = np.unique(mask)
# first id is the background, so remove it
obj_ids = obj_ids[1:]
# split the color-encoded mask into a set
# of binary masks
masks = mask == obj_ids[:, None, None]
# get bounding box coordinates for each mask
num_objs = len(obj_ids)
boxes = []
for i in range(num_objs):
pos = np.where(masks[i])
xmin = np.min(pos[1])
xmax = np.max(pos[1])
ymin = np.min(pos[0])
ymax = np.max(pos[0])
boxes.append([xmin, ymin, xmax, ymax])
# convert everything into a torch.Tensor
boxes = torch.as_tensor(boxes, dtype=torch.float32)
# there is only one class
labels = torch.ones((num_objs,), dtype=torch.int64)
masks = torch.as_tensor(masks, dtype=torch.uint8)
#image_id = torch.tensor([idx])
#area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
# suppose all instances are not crowd
#iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
img = datapoints.Image(img)
boxes = datapoints.BoundingBox(boxes, format=datapoints.BoundingBoxFormat.XYXY, spatial_size=F.get_spatial_size(img) )
labels = datapoints.Label(labels)
if self.transforms is not None:
img, boxes, labels = self.transforms(img, boxes,labels)
return img, boxes, labels
def __len__(self):
return len(self.imgs)
def collate_fn(batch):
return tuple(zip(*batch))
dataset = PennFudanDataset(root="./../PennFudanPed", transforms= transforms.Resize((350,324)) ) #change the root parameter according to your dir structure.
data_loader =
dataset, batch_size=4, shuffle=True, num_workers=1,
B = 16 # Batch size
batched_boxes = []
batched_labels = []
for i in data_loader:
image,boxes, labels= i
image = torch.stack(image)
boxes = list(boxes)
labels = [*labels[0], *labels[1], *labels[2], *labels[3]]
if (counter>B):
batched_images= torch.stack(batched_images)
mosaic= Mosaic()
output = mosaic(batched_images, batched_boxes, batched_labels)
for i in range(B):
viz = utils.draw_bounding_boxes(F.to_image_tensor(output[0][i]), boxes= output[1][i])
Aah we need to review, this. Well I will try my best to find time and review this :smile: as well as understand how this works :)
Aah we need to review, this. Well I will try my best to find time and review this 😄 as well as understand how this works :)
yeah; sure! lmk if something is not clear
Gentle ping for any updates.