torchgeo
torchgeo copied to clipboard
Improvements to RandomBatchGeoSampler
I propose two improvements to RandomBatchGeoSampler
:
- The
size
parameter should be defined in pixels, not in CRS units. The output of a DataLoader using this sampler will be fed directly into a CNN or similar in most cases, which expect a certain defined input size in pixels. To accomplish this, the user has to perform the conversion of CRS to pixels manually like this
tile_size_pix = 256
sampler_size = tile_size_pix * input_ds.res
sampler = RandomBatchGeoSampler(input_ds, size=sampler_size, ...)
which is inconvenient.
- The
length
parameter should have a meaningful default value. Currently, it is described as "length (int) – number of samples per epoch". Since an epoch is defined as processing an entire dataset exactly once, a meaningful default value could be calculated by using the dataset area and the sampler tile area
sampler_len = int(get_bbox_area(input_ds.bounds)/(sampler_size*sampler_size))
sampler = RandomBatchGeoSampler(input_ds, size=sampler_size, batch_size=8, length=sampler_len)
with
# This function would be nice to have in the BoundingBox class
def get_bbox_area(box):
return (box.maxx - box.minx) * (box.maxy - box.miny)
Let me know if this makes sense.
Thanks @tritolol! Both changes make sense to me.
@RitwikGupta has something similar to your first change here https://github.com/microsoft/torchgeo/pull/294.
@adamjstewart is adding a volume attribute to BoundingBox, adding area in the same PR would make sense to me.
With regards to point 1. by @tritolol, the actual output size of the sample mask is not guaranteed to be the actual number of pixels set by tile_size_pix
(see #674). I am however assuming that this is a bug, and that it will be fixed soon