torchgeo
torchgeo copied to clipboard
Add NAIPCluster dataset and datamodule
Dataset of NAIP imagery sampled random from the NAIP archive and masks generated by a KMeans clustering of pixels on-the-fly.


How is this dataset different than our existing NAIP dataset with a new sampler? Basically, I'm wondering if this should be a sampler instead.
- It creates masks by clustering the inputs on the fly (this is also non-trivial if you want the clustering to be a function of a window of pixels vs. a single pixel)
- The NAIP imagery is pre-sampled so you don't have to download a bunch of NAIP tiles / gives you a reproducible set of patches to work with
Couldn't that be done as a transform? Then it could be combined with any dataset, not just NAIP.
If we make this a VisionDataset then we're throwing away all geospatial metadata.
(see my edit above)
Hmm the transform would need to take the model you want to use as input, so that would be a little cumbersome. Roughly you'd have to do:
# sample a bunch of NAIP imagery
# train a cluster model (using the more complicated logic for including windows if necessary)
# create a transform with that cluster model as input
# create another NAIP dataset with that transform
It isn't urgent that we figure this out (or crucial that this be in torchgeo) -- I'll be using this dataset in my own experiments though so I wanted a branch somewhere.