torchgeo VectorDataset: return bounding boxes and instance segmentation masks

Summary

Currently, VectorDataset.__getitem__ returns only a raster mask designed for semantic segmentation. We should extend VectorDataset to support:

raster output

[x] semantic segmentation mask
[ ] instance segmentation mask

vector output

[ ] object detection bounding boxes
[ ] keypoint detection
[ ] object polygons?

Rationale

Currently VectorDataset can only be used for semantic segmentation, but there are many other applications that store data in vector shapefiles.

Implementation

We should decide whether we should always return all of these options, or whether the dataset will have a new parameter that decides which output will be returned.

Alternatives

Could also make a new dataset class.

Additional information

No response

Jan 08 '25 16:01 adamjstewart

I would like to work on this issue

Feb 20 '25 11:02 BeritCheema

Please do!

Feb 20 '25 11:02 adamjstewart

Where should I look first to start working on this issue?

Apr 17 '25 10:04 amrirasyidi

VectorDataset.__getitem__

Apr 17 '25 10:04 adamjstewart

I'm working with this also, and thinking of something like the following:

Use this helper function to convert geocoordinates to pixel coordinates:

def convert_poly_coords(
    geom:shapely.geometry.shape, affine_obj:affine.Affine, inverse:bool=False
) -> shapely.geometry.shape:
    """Convert pixel coordinates to geocoordinates and vice versa, 
    based on `affine_obj`

    Args:
        geom: shapely.geometry.shape to convert
        affine_obj: affine.Affine object to use for geoconversion
        inverse: If true, convert geocoordinates to pixel coordinates

    Returns:
        shapely.geometry.shape: input shape converted to pixel coordinates
    """
    if inverse:
        affine_obj = ~affine_obj

    xformed_shape = shapely.affinity.affine_transform(geom, 
        [
            affine_obj.a,
            affine_obj.b,
            affine_obj.d,
            affine_obj.e,
            affine_obj.xoff,
            affine_obj.yoff,
        ],
    )
    return xformed_shape

And then modify VectorDataset.__getitem__ with:

        if shapes:
            masks = rasterio.features.rasterize(
                shapes, out_shape=(round(height), round(width)), transform=transform
            )
            # convert shapes to pixel coordinates
            px_shapes = [convert_poly_coords(shapely.geometry.shape(s[0]), transform, inverse=True) 
                         for s in shapes]

            # top-left, top-right, bottom-right and bottom-left, coordinates in x, y order
            boxes = [[[min(p.bounds[0], p.bounds[2]), max(p.bounds[1], p.bounds[3])],
                      [max(p.bounds[0], p.bounds[2]), max(p.bounds[1], p.bounds[3])],
                      [max(p.bounds[0], p.bounds[2]), min(p.bounds[1], p.bounds[3])],
                      [min(p.bounds[0], p.bounds[2]), min(p.bounds[1], p.bounds[3])]]
                      for p in px_shapes]

            # xmin, ymin, xmax, ymax format
            boxes_xyxy = [[p.bounds[0], p.bounds[1], p.bounds[2], p.bounds[3]]
                           for p in px_shapes]

            # min, ymin, width, height format
            boxes_xywh = [[p.bounds[0], p.bounds[1], p.bounds[2]-p.bounds[0], p.bounds[3]-p.bounds[1]]
                           for p in px_shapes]

            # Segmentation polygon is in COCO format, so [x0, y0, x1, y1, ...]
            segmentations = [list(sum(p.exterior.coords[:-1], ()))
                             for p in px_shapes]

            # Get labels
            labels = [s[1] for s in shapes]

        else:
            # If no features are found in this query, return an empty mask
            # with the default fill value and dtype used by rasterize
            masks = np.zeros((round(height), round(width)), dtype=np.uint8)
            boxes = []  
            boxes_xyxy = [] 
            boxes_xywh = []  
            segmentations = []  
            labels = [] 

        # Use array_to_tensor since rasterize may return uint16/uint32 arrays.
        masks = array_to_tensor(masks)
        boxes = array_to_tensor(np.array(boxes))
        boxes_xyxy = array_to_tensor(np.array(boxes_xyxy))
        boxes_xywh = array_to_tensor(np.array(boxes_xywh))
        segmentations = array_to_tensor(np.array(segmentations))
        labels = array_to_tensor(np.array(labels))

        masks = masks.to(self.dtype)
        boxes = boxes.to(self.dtype)
        boxes_xyxy = boxes_xyxy.to(self.dtype)
        boxes_xywh = boxes_xywh.to(self.dtype)
        segmentations = segmentations.to(self.dtype)
        labels = labels.to(self.dtype)

        sample = {'mask': masks, 
                  'bbox': boxes, 
                  'bbox_xyxy': boxes_xyxy, 
                  'bbox_xywh': boxes_xywh, 
                  'segmentation': segmentations, 
                  'crs': self.crs, 
                  'label': labels, 
                  'bounds': query}

Though I still have some problems with dataloaders and collate_fn in the cases where individual items in the batch have different number of objects, and for some reason ObjectDetectionTask fails with images without any annotations.

Any suggestions on how to move forward from here? Conversion from geocoordinates to pixel coordinates is straightforward, how to form sample dict so that all further tasks do not break is not completely clear yet.

May 27 '25 11:05 mayrajeo

This approach makes sense to me. We primarily care about bbox_xyxy since that's what our ObjectDetectionTask currently focuses on.

Though I still have some problems with dataloaders and collate_fn in the cases where individual items in the batch have different number of objects, and for some reason ObjectDetectionTask fails with images without any annotations.

Let's investigate and fix this in a different issue/PR and keep this discussion focused on VectorDataset changes.

Jun 04 '25 08:06 adamjstewart

This approach makes sense to me. We primarily care about bbox_xyxy since that's what our ObjectDetectionTask currently focuses on.

How about the segmentation for instance segmentation tasks, is COCO-format polygon OK?

Though I still have some problems with dataloaders and collate_fn in the cases where individual items in the batch have different number of objects, and for some reason ObjectDetectionTask fails with images without any annotations.

Let's investigate and fix this in a different issue/PR and keep this discussion focused on VectorDataset changes.

Sounds good, the above conversion should work for both boxes and polygons. I haven't yet tested how much slower it is compared to sampling without it.

Also, the above converts the coordinates so that they can be fractional pixels, though that can be easily fixed by rounding if it's not a desired behavior.

Jun 04 '25 09:06 mayrajeo

For instance segmentation, our InstanceSegmentationTask trainer currently uses Mask R-CNN, so it should be in a compatible format. See how VHR-10 does it, we want to use the same syntax.

Jun 04 '25 10:06 adamjstewart

Looks like it is in COCO format, so the sample returned by VectorDataset.__getitem__ should be like:

        sample = {
            'mask': masks, # segmentation mask, HxW Tensor for masks
            'bbox_xyxy': boxes_xyxy, # [N, 4] tensor containing bounding boxes
            'segmentation': segmentations, # [N, x] tensor containing COCO-format polygons
            'label': labels, # [N] tensor containing the labels corresponding to each bbox or segmentation
            'crs': self.crs, 
            'bounds': query,
        }

The downside for this is, as mentioned, that using stack_stamples as the collate_fn with dataloaders breaks with this as there is no guarantee that each item in a batch has the same number of objects to detect.

I'll open a PR for this soon, after I figure out what else breaks when VectorDataset suddenly returns three additional things.

Jun 05 '25 12:06 mayrajeo

You can use torchgeo.datamodules.utils.collate_fn_detection for this instead.

Jun 05 '25 12:06 isaaccorley

You can use torchgeo.datamodules.utils.collate_fn_detection for this instead.

That works, yes. However, right now the boxes etc are generated even if the user only wants to do segmentation.

Jun 05 '25 13:06 mayrajeo

This is because for instance segmentation you still need to pass boxes to the model. See the MaskRCNN docs here.

Jun 05 '25 13:06 isaaccorley

torchgeo torchgeo copied to clipboard

VectorDataset: return bounding boxes and instance segmentation masks

Summary

raster output

vector output

Rationale

Implementation

Alternatives

Additional information

torchgeo
torchgeo copied to clipboard