torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Add NAIPCluster dataset and datamodule

Open calebrob6 opened this issue 3 years ago • 6 comments

Dataset of NAIP imagery sampled random from the NAIP archive and masks generated by a KMeans clustering of pixels on-the-fly.

image

image

calebrob6 avatar Mar 01 '22 19:03 calebrob6

How is this dataset different than our existing NAIP dataset with a new sampler? Basically, I'm wondering if this should be a sampler instead.

adamjstewart avatar Mar 01 '22 19:03 adamjstewart

  • It creates masks by clustering the inputs on the fly (this is also non-trivial if you want the clustering to be a function of a window of pixels vs. a single pixel)
  • The NAIP imagery is pre-sampled so you don't have to download a bunch of NAIP tiles / gives you a reproducible set of patches to work with

calebrob6 avatar Mar 01 '22 19:03 calebrob6

Couldn't that be done as a transform? Then it could be combined with any dataset, not just NAIP.

If we make this a VisionDataset then we're throwing away all geospatial metadata.

adamjstewart avatar Mar 01 '22 19:03 adamjstewart

(see my edit above)

Hmm the transform would need to take the model you want to use as input, so that would be a little cumbersome. Roughly you'd have to do:

# sample a bunch of NAIP imagery
# train a cluster model (using the more complicated logic for including windows if necessary)
# create a transform with that cluster model as input
# create another NAIP dataset with that transform

calebrob6 avatar Mar 01 '22 19:03 calebrob6

It isn't urgent that we figure this out (or crucial that this be in torchgeo) -- I'll be using this dataset in my own experiments though so I wanted a branch somewhere.

calebrob6 avatar Mar 01 '22 19:03 calebrob6