torchgeo
torchgeo copied to clipboard
Inconsistencies between handling of bands in datasets
I am currently working on adding plotting methods based on #253. For datasets that contain Sentinel multispectral imagery, I have two questions.
- In the CV4AKenyaCropType, the band names can be chosen, when creating a dataset by passing a tuple of strings. In the SEN12MS dataset, the bands can be chose with integers in a list. In ZueriCrop dataset it is not possible to specify bands. Should there not be a standardized way across all datasets?
- Given that people can choose different bands, how should the plot method be handled. For example, when RGB bands are not chosen? Or another way of handling plotting for multispectral datasets?
- Yes, this should be standardized some way
- We could either have plotting fail, or have it still work as long as there are at least 3 bands. @calebrob6 thoughts?
Great questions!
- Agree with Adam
- I've been raising a ValueError when the RGB bands aren't present (https://github.com/microsoft/torchgeo/blob/main/torchgeo/datasets/seco.py#L260, https://github.com/microsoft/torchgeo/blob/plotting/torchgeo/datasets/benin_cashews.py#L440). I wanted the plot methods to be an easy way for someone to see what the data looks like / gently introduce people who aren't familiar with satellite imagery to the fact that it doesn't come nicely normalized (i.e. I don't want users to get stuck trying to visualize S2 patches). I don't want the plot methods to try to solve every case of visualizing the data.
- If you have a preference for a standardized way of doing this, I can change all the datasets accordingly.
- Given a standardized way of choosing bands, I can implement the plotting methods for those datasets like the examples you gave.
- I like list or tuple of string band names
- Great!
I think both of these can be done at the same time
An alternative could be to plot a false color image from a user defined subset of the bands in the image (basically a user defines indices like [0, 1, 2]) as an argument to plot
. This is how it's done in torchgeo.datasets.IDTReeS
when plotting the hyperspectral image.
Another issue I noticed is that So2Sat
doesn't allow you to select bands but So2SatDataModule
does. The issue with this is that the So2Sat.plot
method assumes that the sample will always have all bands. We should copy the So2SatDataModule
band selection to So2Sat
and update So2Sat.plot
to handle this.
Relating to band selection: as more datasets arise in torchgeo, could band selection become standardized to the STAC specification common_names? This could also become a standard way of doing band selection from STAC items when some datasets point to STAC items.
FYI, I'm currently trying this "band selection interface" out with a dataset implemented torchgeo's way. I think it has some potential.
Related to, and partially addressed by #687.
As I mentioned in #394, I'm starting to think that it doesn't make much sense to allow band selection for curated benchmark datasets, only for uncurated raster datasets. What do you think?
Why not? E.g. experiments that compare how multispectral information compare to only rgb information on benchmark datasets are quite interesting, I think.
Hmm, that's fair, someone might want to do a more in-depth ablation study on which bands actually matter. I guess I'm not opposed if someone wants it, but I wouldn't go changing everything if we don't need it, especially if it doesn't speed up data loading times.
I think I want to rename the So2Sat bands anyway, will submit a PR to do that.