vision icon indicating copy to clipboard operation
vision copied to clipboard

Mosaic Transform

Open abhi-glitchhg opened this issue 2 years ago • 7 comments

Part of #6323

abhi-glitchhg avatar Sep 02 '22 20:09 abhi-glitchhg

@abhi-glitchhg Just checking with you to see if you got stack anywhere. :) Let me know if you face any issues.

datumbox avatar Sep 14 '22 14:09 datumbox

Hey @datumbox, Thanks for checking on me! 🤗; I was a bit busy for some time.

I have gone through the mosaic implementation and have understood it;

I have a basic implementation locally. Hopefully, by this weekend, I will clean up and update this PR. Thanks, Abhijit :)

abhi-glitchhg avatar Sep 15 '22 17:09 abhi-glitchhg

Still WIP

abhi-glitchhg avatar Sep 18 '22 19:09 abhi-glitchhg

first of all, I apologize for the inactivity on this pr. I'll be more regular from now on.

I have used Pedestrian Dataset to check the implementation. Download the dataset I have tested the implementation with following code; to create image tensor of shape B*4*C*H*W I have used for loop, there might be some efficient way to do this.

import torch

from torchvision.prototype import transforms, datapoints

from torchvision.prototype.transforms import functional as F

from torchvision import utils


import os
import numpy as np
import torch
from PIL import Image

from references.detection.transforms import Mosaic


class PennFudanDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms ):
        self.root = root
        self.transforms=  transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images and masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        img = F.pil_to_tensor(img)
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)
        # convert the PIL Image into a numpy array
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        #image_id = torch.tensor([idx])
        #area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        #iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        img = datapoints.Image(img)
        boxes = datapoints.BoundingBox(boxes, format=datapoints.BoundingBoxFormat.XYXY, spatial_size=F.get_spatial_size(img) )
        labels = datapoints.Label(labels)
        if self.transforms is not None:
            img, boxes, labels = self.transforms(img, boxes,labels)

        return img, boxes, labels

    def __len__(self):
        return len(self.imgs)


def collate_fn(batch):
    return tuple(zip(*batch))

dataset = PennFudanDataset(root="./../PennFudanPed", transforms= transforms.Resize((350,324))  ) #change the root parameter according to your dir structure. 

data_loader = torch.utils.data.DataLoader(
 dataset, batch_size=4, shuffle=True, num_workers=1,
 collate_fn=collate_fn)

B = 16  # Batch size
counter=0 

batched_images=[]
batched_boxes = []
batched_labels = []
for i in data_loader:
    image,boxes, labels= i 
    image = torch.stack(image)
    boxes = list(boxes)
    labels = [*labels[0], *labels[1], *labels[2], *labels[3]]
    batched_images.append(image)
    batched_boxes.append(boxes)
    batched_labels.append(labels)
    counter+=1
    if (counter>B):
        break
    
batched_images= torch.stack(batched_images)
mosaic=  Mosaic()
output = mosaic(batched_images, batched_boxes, batched_labels)

for i in range(B):
    viz = utils.draw_bounding_boxes(F.to_image_tensor(output[0][i]), boxes= output[1][i])
    F.to_pil_image(viz).show()

abhi-glitchhg avatar Jan 26 '23 10:01 abhi-glitchhg

Aah we need to review, this. Well I will try my best to find time and review this :smile: as well as understand how this works :)

oke-aditya avatar Feb 11 '23 09:02 oke-aditya

Aah we need to review, this. Well I will try my best to find time and review this 😄 as well as understand how this works :)

yeah; sure! lmk if something is not clear

abhi-glitchhg avatar Feb 14 '23 14:02 abhi-glitchhg

Gentle ping for any updates.

byronyi avatar May 31 '23 01:05 byronyi