torchgeo
torchgeo copied to clipboard
UnionDataset of two IntersectionDataset fails
Description
I am trying to take the union of two IntersectionDatasets for scene diversity (WRS2 16/30 and WRS2 38/37). Each IntersectionDataset takes the intersection of two datasets, one with cloudy imagery and one with cloud-free imagery. The following code gives the failure code listed below:
The error message is:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[13], [line 1](vscode-notebook-cell:?execution_count=13&line=1)
----> [1](vscode-notebook-cell:?execution_count=13&line=1) for inputs,labels,masks in loader:
[2](vscode-notebook-cell:?execution_count=13&line=2) break
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:631](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:631), in _BaseDataLoaderIter.__next__(self)
[628](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:628) if self._sampler_iter is None:
[629](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:629) # TODO(https://github.com/pytorch/pytorch/issues/76750)
[630](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:630) self._reset() # type: ignore[call-arg]
--> [631](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:631) data = self._next_data()
[632](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:632) self._num_yielded += 1
[633](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:633) if self._dataset_kind == _DatasetKind.Iterable and \
[634](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:634) self._IterableDataset_len_called is not None and \
[635](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:635) self._num_yielded > self._IterableDataset_len_called:
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:675](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:675), in _SingleProcessDataLoaderIter._next_data(self)
[673](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:673) def _next_data(self):
[674](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:674) index = self._next_index() # may raise StopIteration
--> [675](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:675) data = self._dataset_fetcher.fetch(index) # may raise StopIteration
[676](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:676) if self._pin_memory:
[677](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/dataloader.py:677) data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51), in _MapDatasetFetcher.fetch(self, possibly_batched_index)
[49](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:49) data = self.dataset.__getitems__(possibly_batched_index)
[50](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:50) else:
---> [51](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51) data = [self.dataset[idx] for idx in possibly_batched_index]
[52](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:52) else:
[53](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:53) data = self.dataset[possibly_batched_index]
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51), in <listcomp>(.0)
[49](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:49) data = self.dataset.__getitems__(possibly_batched_index)
[50](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:50) else:
---> [51](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:51) data = [self.dataset[idx] for idx in possibly_batched_index]
[52](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:52) else:
[53](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py:53) data = self.dataset[possibly_batched_index]
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1005](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1005), in UnionDataset.__getitem__(self, query)
[1003](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1003) for ds in self.datasets:
[1004](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1004) if list(ds.index.intersection(tuple(query))):
-> [1005](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1005) samples.append(ds[query])
[1007](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1007) sample = self.collate_fn(samples)
[1009](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:1009) if self.transforms is not None:
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881), in IntersectionDataset.__getitem__(self, query)
[876](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:876) raise IndexError(
[877](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:877) f"query: {query} not found in index with bounds: {self.bounds}"
[878](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:878) )
[880](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:880) # All datasets are guaranteed to have a valid query
--> [881](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881) samples = [ds[query] for ds in self.datasets]
[883](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:883) sample = self.collate_fn(samples)
[885](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:885) if self.transforms is not None:
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881), in <listcomp>(.0)
[876](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:876) raise IndexError(
[877](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:877) f"query: {query} not found in index with bounds: {self.bounds}"
[878](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:878) )
[880](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:880) # All datasets are guaranteed to have a valid query
--> [881](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:881) samples = [ds[query] for ds in self.datasets]
[883](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:883) sample = self.collate_fn(samples)
[885](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:885) if self.transforms is not None:
File [~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:405](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:405), in RasterDataset.__getitem__(self, query)
[402](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:402) filepaths = cast(List[str], [hit.object for hit in hits])
[404](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:404) if not filepaths:
--> [405](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:405) raise IndexError(
[406](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:406) f"query: {query} not found in index with bounds: {self.bounds}"
[407](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:407) )
[409](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:409) if self.separate_files:
[410](https://vscode-remote+ssh-002dremote-002bsporcsubmit-002erc-002erit-002eedu.vscode-resource.vscode-cdn.net/home/ta6167/Euphrates/notebooks/~/anaconda3/envs/euphrates/lib/python3.10/site-packages/torchgeo/datasets/geo.py:410) data_list: List[Tensor] = []
IndexError: query: BoundingBox(minx=-3351139.6764310533, maxx=-3335779.6764310533, miny=4378501.30259927, maxy=4393861.30259927, mint=0.0, maxt=9.223372036854776e+18) not found in index with bounds: BoundingBox(minx=657285.0, maxx=888015.0, miny=3558885.0, maxy=3789615.0, mint=0.0, maxt=9.223372036854776e+18)
Steps to reproduce
import os
import torch
import torchgeo
import collections
import cv2
import matplotlib.pyplot as plt
import numpy as np
from torch.utils.data import DataLoader
from torchgeo.datasets import Landsat8, stack_samples,BoundingBox,IntersectionDataset, UnionDataset
from torchgeo.samplers import RandomGeoSampler,RandomBatchGeoSampler
from torchvision.transforms import Compose
from typing import *
PROJ_DIR = "/data/euphrates"
# root_path = f"{PROJ_DIR}/datasets/landsat/L8_16_30_center_trimmed"
root_path = "<home_dir>/datasets/landsat/summer_trimmed/"
desert_path = "<home_dir>/datasets/landsat/desert_summer_trimmed"
shadow_path = os.path.join(root_path, "shadow")
clean_path = os.path.join(root_path, "clean")
desert_shadow_path = os.path.join(desert_path,"shadow")
desert_clean_path = os.path.join(desert_path,"clean")
bands = ["B4","B3","B2","QA_PIXEL"]
Landsat8.filename_regex = ".*_(?P<band>[A-Z0-9_]+)\."
Landsat8.rgb_bands = ["B4","B3","B2"]
import sys
sys.path.append("<home_dir>/Euphrates/src/euphrates_utils/")
from data_utils import pick_images,filter_batchdict
from data_utils import *
from transforms import TransformImageBands,TransformQABand
transform_stack = Compose([
TransformImageBands(normalization_func=TransformImageBands.peak_val_normalize, num_bands=3, joint_batch=True),
TransformQABand(num_bands=3, get_clouds=True, get_shadows=True, joint_batch=True)]
)
upstate_dataset = IntersectionDataset(Landsat8(shadow_path, bands=bands),
Landsat8(clean_path, bands=bands),
transforms=transform_stack)
arizona_dataset = IntersectionDataset(Landsat8(desert_shadow_path, bands=bands),
Landsat8(desert_clean_path, bands=bands),
transforms=transform_stack)
dataset = UnionDataset(upstate_dataset,arizona_dataset)
sampler = RandomBatchGeoSampler(dataset, size=(512, 512), batch_size=16, length=1600)
loader = DataLoader(dataset, batch_sampler=sampler,collate_fn=filter_batchdict)
for inputs,labels,masks in loader:
break
Version
0.5.0.dev0
You haven't shared enough code to reproduce the issue:
> python3 test.py
Traceback (most recent call last):
File "/Users/Adam/torchgeo/test.py", line 37, in <module>
from data_utils import pick_images,filter_batchdict
ModuleNotFoundError: No module named 'data_utils'
Anything you can do to make an MRE helps me solve the issue.
You can safely ignore those as they don't relate to the problem and use stack_samples. Here is what I found however: The issue may not be even about UnionDataset. I am trying to just use data loader for arizona dataset and that is still failing on its own. For the WRS2 38/37, IntersectionDataset gives rise to the following error message: https://pastebin.com/V3QsZWQC
For your reference, here's what the file system for the dataset looks like: https://pastebin.com/D9hQTM0W
Previously such issues arose due to mint, maxt but now that I am relaxing the regex condition, I am not sure why this is happening. Will that be enough details?
@TolgaAktas are you still encountering this issue? Do you have an MRE I can use to reproduce the issue?