torchgeo
torchgeo copied to clipboard
Chesapeake RasterDatasets will not be able to be used "as is" in modeling
Issue
The Chesapeake state-specific family of RasterDatasets are land cover masks that have values like "11" for water, and "22" for impervious structures. If you were to create an IntersectionDataset with some imagery layer, then you would not be able to use this in modeling as torch cross entropy expects the mask values to be in [0, num_classes - 1]. You would first need to write a transform to re-map the values to this range.
Another thing I just found is that by default the dataset will be instantiated with both the 2013 and 2018 layers if you pass the root directory (i.e. what you would do if you used download=True). If you use a RandomGeoSampler then sometimes you will get 2013 patches and sometimes you will get 2018 patches. If you've already downloaded it and do ds = ChesapeakeDE(paths="data/ChesapeakeDE/de_lulc_2013_2022-Edition.tif") then you'll just get a single layer.
This seems to come up a lot. Especially in our land cover datasets, you almost always want to be able to select a subset of classes and then map them to ordinal numbers. CDL and NLCD do this, and I want to do the same for many others.
Instead of hard-coding this in every dataset, should we create a standard Kornia transform to do this? I think it should be easy to do.
I think you can simply do something like:
class_val_to_idx = np.array([
0,
0,
....
1, # the 11th index should map to 1
2, # the 12th index should map to 2
...
])
then mask = class_val_to_idx[mask]
you almost always want to be able to select a subset of classes
Why would you only want to re-map a subset of classes?
I think you can simply do something like:
Correct, but we could formalize this in a more user-friendly transform.
Why would you only want to re-map a subset of classes?
Not remap a subset, select a subset. So instead of training on 256 class CDL, you pick the 10 most common classes and only use those.