torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

VectorShapesDataset for loading geometries from vector files

Open weiji14 opened this issue 2 years ago • 0 comments

The current VectorDataset in torchgeo v0.2.0 returns an image mask, but people might want the actual geometries instead (e.g. for object detection tasks which uses bounding boxes).

This PR moves the geometry loading logic in VectorDataset (handled by fiona) into a _load_shapes method. A new VectorShapesDataset class is then created (subclassed from this modified VectorDataset), which returns a sample like so:

sample = {
    "shapes": shapes,  # the polygon geometries
    "crs": self.crs,  # Coordinate reference system
    "bbox": query,  # Original bounding box query
}

Note that the geometries returned are raw geometries like [(0.0, 0.0), (0.0, 1.0), (1.0, 1.0), (1.0, 0.0), (0.0, 0.0)] (in the case of a polygon), and the user would need to write their own code to convert it into a bounding box tuple like (minx, miny, maxx, maxy) or (x, y, width, height). I've got some code to do this, but want to know whether this VectorShapesDataset should be generic, or actually output those bounding boxes.

Happy to add more tests and/or change this draft implementation. I've just been working on an object detection project that has the bounding box labels in a shapefile/geopackage, and thought it might be useful to have this in torchgeo :smiley:

May help with the object detection related feature requests at #442 and #454.

weiji14 avatar Mar 11 '22 03:03 weiji14