torchgeo
torchgeo copied to clipboard
Sentinel2 Dataset behavior
Description
I have a Sentinel 2 scene with the following files (e.g. in ./test_scene/):
T36KVU_20210513T073609_B01_60m.tif
T36KVU_20210513T073609_B02_10m.tif
T36KVU_20210513T073609_B03_10m.tif
T36KVU_20210513T073609_B04_10m.tif
T36KVU_20210513T073609_B05_20m.tif
T36KVU_20210513T073609_B06_20m.tif
T36KVU_20210513T073609_B07_20m.tif
T36KVU_20210513T073609_B08_10m.tif
T36KVU_20210513T073609_B09_60m.tif
T36KVU_20210513T073609_B11_20m.tif
T36KVU_20210513T073609_B12_20m.tif
T36KVU_20210513T073609_B8A_20m.tif
T36KVU_20210513T073609_TCI_10m.tif
I would expect any of the following to work:
ds = Sentinel2(
"test_scene/",
bands=["B01"],
)
ds = Sentinel2(
"test_scene/",
bands=["B01", "B02"],
)
ds = Sentinel2(
"test_scene/",
bands=["B01", "B02"],
res=37
)
However the filename_glob and filename_regex are setup in such a way that none of the above are recognized as valid Sentinel 2 scenes.
Steps to reproduce
see above
Version
0.6.0.dev0
Further:
ds = Sentinel2(
"test_scene/",
bands=["B01", "B02"],
res=60,
)
will not throw an error, but ds[ds.bounds] will throw an error.
@estherrolf for visibility
This was specifically broken by https://github.com/microsoft/torchgeo/pull/754/files#diff-79277b084e67f13f6469cba19e6eadb93ce6c6479cef26161a0c847b75705a81
Basically, depending on where you download your data from, you either get:
- All bands in 10m resolution (resampled, no suffix)
- All bands in native resolution (10m, 20m, or 60m)
- All bands in all resolutions (10m, 20m, and 60m)
1, 2, and 3 are all somewhat contradictory. We could easily support each of these on their own, but supporting all 3 in combination is hard:
A. Remove resolution from the regex (only supports 1) B. Replace resolution with a wildcard (only supports 1 and 2) C. Include 10m in the regex (only supports 3)
In order to prioritize the highest resolution, maybe we could sort the glob results lexicographically and choose the first one only? But that feels really sloppy and could probably break for more complicated hypothetical datasets.
I think this is strange behavior as one of the points of RasterDataset is that it can resample/align different layers to the same resolution.