torchgeo Inconsistencies between handling of bands in datasets

I am currently working on adding plotting methods based on #253. For datasets that contain Sentinel multispectral imagery, I have two questions.

In the CV4AKenyaCropType, the band names can be chosen, when creating a dataset by passing a tuple of strings. In the SEN12MS dataset, the bands can be chose with integers in a list. In ZueriCrop dataset it is not possible to specify bands. Should there not be a standardized way across all datasets?
Given that people can choose different bands, how should the plot method be handled. For example, when RGB bands are not chosen? Or another way of handling plotting for multispectral datasets?

Dec 15 '21 18:12 nilsleh

Yes, this should be standardized some way
We could either have plotting fail, or have it still work as long as there are at least 3 bands. @calebrob6 thoughts?

Dec 15 '21 18:12 adamjstewart

Great questions!

Agree with Adam
I've been raising a ValueError when the RGB bands aren't present (https://github.com/microsoft/torchgeo/blob/main/torchgeo/datasets/seco.py#L260, https://github.com/microsoft/torchgeo/blob/plotting/torchgeo/datasets/benin_cashews.py#L440). I wanted the plot methods to be an easy way for someone to see what the data looks like / gently introduce people who aren't familiar with satellite imagery to the fact that it doesn't come nicely normalized (i.e. I don't want users to get stuck trying to visualize S2 patches). I don't want the plot methods to try to solve every case of visualizing the data.

Dec 15 '21 18:12 calebrob6

If you have a preference for a standardized way of doing this, I can change all the datasets accordingly.
Given a standardized way of choosing bands, I can implement the plotting methods for those datasets like the examples you gave.

Dec 15 '21 18:12 nilsleh

I like list or tuple of string band names
Great!

I think both of these can be done at the same time

Dec 15 '21 18:12 calebrob6

An alternative could be to plot a false color image from a user defined subset of the bands in the image (basically a user defines indices like [0, 1, 2]) as an argument to plot. This is how it's done in torchgeo.datasets.IDTReeS when plotting the hyperspectral image.

Dec 15 '21 18:12 isaaccorley

Another issue I noticed is that So2Sat doesn't allow you to select bands but So2SatDataModule does. The issue with this is that the So2Sat.plot method assumes that the sample will always have all bands. We should copy the So2SatDataModule band selection to So2Sat and update So2Sat.plot to handle this.

Dec 31 '21 19:12 adamjstewart

Relating to band selection: as more datasets arise in torchgeo, could band selection become standardized to the STAC specification common_names? This could also become a standard way of doing band selection from STAC items when some datasets point to STAC items.

FYI, I'm currently trying this "band selection interface" out with a dataset implemented torchgeo's way. I think it has some potential.

Feb 23 '22 20:02 remtav

Related to, and partially addressed by #687.

As I mentioned in #394, I'm starting to think that it doesn't make much sense to allow band selection for curated benchmark datasets, only for uncurated raster datasets. What do you think?

Aug 21 '22 20:08 adamjstewart

Why not? E.g. experiments that compare how multispectral information compare to only rgb information on benchmark datasets are quite interesting, I think.

Aug 21 '22 20:08 calebrob6

Hmm, that's fair, someone might want to do a more in-depth ablation study on which bands actually matter. I guess I'm not opposed if someone wants it, but I wouldn't go changing everything if we don't need it, especially if it doesn't speed up data loading times.

I think I want to rename the So2Sat bands anyway, will submit a PR to do that.

Aug 21 '22 23:08 adamjstewart

torchgeo torchgeo copied to clipboard

Inconsistencies between handling of bands in datasets

torchgeo
torchgeo copied to clipboard