torchgeo
torchgeo copied to clipboard
VectorDataset: return bounding boxes and instance segmentation masks
Summary
Currently, VectorDataset.__getitem__ returns only a raster mask designed for semantic segmentation. We should extend VectorDataset to support:
raster output
- [x] semantic segmentation mask
- [ ] instance segmentation mask
vector output
- [ ] object detection bounding boxes
- [ ] keypoint detection
- [ ] object polygons?
Rationale
Currently VectorDataset can only be used for semantic segmentation, but there are many other applications that store data in vector shapefiles.
Implementation
We should decide whether we should always return all of these options, or whether the dataset will have a new parameter that decides which output will be returned.
Alternatives
Could also make a new dataset class.
Additional information
No response
I would like to work on this issue
Please do!
Where should I look first to start working on this issue?
VectorDataset.__getitem__
I'm working with this also, and thinking of something like the following:
Use this helper function to convert geocoordinates to pixel coordinates:
def convert_poly_coords(
geom:shapely.geometry.shape, affine_obj:affine.Affine, inverse:bool=False
) -> shapely.geometry.shape:
"""Convert pixel coordinates to geocoordinates and vice versa,
based on `affine_obj`
Args:
geom: shapely.geometry.shape to convert
affine_obj: affine.Affine object to use for geoconversion
inverse: If true, convert geocoordinates to pixel coordinates
Returns:
shapely.geometry.shape: input shape converted to pixel coordinates
"""
if inverse:
affine_obj = ~affine_obj
xformed_shape = shapely.affinity.affine_transform(geom,
[
affine_obj.a,
affine_obj.b,
affine_obj.d,
affine_obj.e,
affine_obj.xoff,
affine_obj.yoff,
],
)
return xformed_shape
And then modify VectorDataset.__getitem__ with:
if shapes:
masks = rasterio.features.rasterize(
shapes, out_shape=(round(height), round(width)), transform=transform
)
# convert shapes to pixel coordinates
px_shapes = [convert_poly_coords(shapely.geometry.shape(s[0]), transform, inverse=True)
for s in shapes]
# top-left, top-right, bottom-right and bottom-left, coordinates in x, y order
boxes = [[[min(p.bounds[0], p.bounds[2]), max(p.bounds[1], p.bounds[3])],
[max(p.bounds[0], p.bounds[2]), max(p.bounds[1], p.bounds[3])],
[max(p.bounds[0], p.bounds[2]), min(p.bounds[1], p.bounds[3])],
[min(p.bounds[0], p.bounds[2]), min(p.bounds[1], p.bounds[3])]]
for p in px_shapes]
# xmin, ymin, xmax, ymax format
boxes_xyxy = [[p.bounds[0], p.bounds[1], p.bounds[2], p.bounds[3]]
for p in px_shapes]
# min, ymin, width, height format
boxes_xywh = [[p.bounds[0], p.bounds[1], p.bounds[2]-p.bounds[0], p.bounds[3]-p.bounds[1]]
for p in px_shapes]
# Segmentation polygon is in COCO format, so [x0, y0, x1, y1, ...]
segmentations = [list(sum(p.exterior.coords[:-1], ()))
for p in px_shapes]
# Get labels
labels = [s[1] for s in shapes]
else:
# If no features are found in this query, return an empty mask
# with the default fill value and dtype used by rasterize
masks = np.zeros((round(height), round(width)), dtype=np.uint8)
boxes = []
boxes_xyxy = []
boxes_xywh = []
segmentations = []
labels = []
# Use array_to_tensor since rasterize may return uint16/uint32 arrays.
masks = array_to_tensor(masks)
boxes = array_to_tensor(np.array(boxes))
boxes_xyxy = array_to_tensor(np.array(boxes_xyxy))
boxes_xywh = array_to_tensor(np.array(boxes_xywh))
segmentations = array_to_tensor(np.array(segmentations))
labels = array_to_tensor(np.array(labels))
masks = masks.to(self.dtype)
boxes = boxes.to(self.dtype)
boxes_xyxy = boxes_xyxy.to(self.dtype)
boxes_xywh = boxes_xywh.to(self.dtype)
segmentations = segmentations.to(self.dtype)
labels = labels.to(self.dtype)
sample = {'mask': masks,
'bbox': boxes,
'bbox_xyxy': boxes_xyxy,
'bbox_xywh': boxes_xywh,
'segmentation': segmentations,
'crs': self.crs,
'label': labels,
'bounds': query}
Though I still have some problems with dataloaders and collate_fn in the cases where individual items in the batch have different number of objects, and for some reason ObjectDetectionTask fails with images without any annotations.
Any suggestions on how to move forward from here? Conversion from geocoordinates to pixel coordinates is straightforward, how to form sample dict so that all further tasks do not break is not completely clear yet.
This approach makes sense to me. We primarily care about bbox_xyxy since that's what our ObjectDetectionTask currently focuses on.
Though I still have some problems with dataloaders and
collate_fnin the cases where individual items in the batch have different number of objects, and for some reasonObjectDetectionTaskfails with images without any annotations.
Let's investigate and fix this in a different issue/PR and keep this discussion focused on VectorDataset changes.
This approach makes sense to me. We primarily care about
bbox_xyxysince that's what ourObjectDetectionTaskcurrently focuses on.
How about the segmentation for instance segmentation tasks, is COCO-format polygon OK?
Though I still have some problems with dataloaders and
collate_fnin the cases where individual items in the batch have different number of objects, and for some reasonObjectDetectionTaskfails with images without any annotations.Let's investigate and fix this in a different issue/PR and keep this discussion focused on
VectorDatasetchanges.
Sounds good, the above conversion should work for both boxes and polygons. I haven't yet tested how much slower it is compared to sampling without it.
Also, the above converts the coordinates so that they can be fractional pixels, though that can be easily fixed by rounding if it's not a desired behavior.
For instance segmentation, our InstanceSegmentationTask trainer currently uses Mask R-CNN, so it should be in a compatible format. See how VHR-10 does it, we want to use the same syntax.
Looks like it is in COCO format, so the sample returned by VectorDataset.__getitem__ should be like:
sample = {
'mask': masks, # segmentation mask, HxW Tensor for masks
'bbox_xyxy': boxes_xyxy, # [N, 4] tensor containing bounding boxes
'segmentation': segmentations, # [N, x] tensor containing COCO-format polygons
'label': labels, # [N] tensor containing the labels corresponding to each bbox or segmentation
'crs': self.crs,
'bounds': query,
}
The downside for this is, as mentioned, that using stack_stamples as the collate_fn with dataloaders breaks with this as there is no guarantee that each item in a batch has the same number of objects to detect.
I'll open a PR for this soon, after I figure out what else breaks when VectorDataset suddenly returns three additional things.
You can use torchgeo.datamodules.utils.collate_fn_detection for this instead.
You can use
torchgeo.datamodules.utils.collate_fn_detectionfor this instead.
That works, yes. However, right now the boxes etc are generated even if the user only wants to do segmentation.
This is because for instance segmentation you still need to pass boxes to the model. See the MaskRCNN docs here.