MONAI
MONAI copied to clipboard
MONAI Dataset objects incompatible with `torch.utils.data.BatchSampler`
Describe the bug
The default collate function fails when using a BatchSampler because datasets given lists of indices or slices return Subset instances from __getitem__. This doesn't appear to be the protocol Pytorch expected.
To Reproduce
Run the following:
import torch
from monai.data import Dataset, DataLoader
from monai.transforms import Compose
def load_data(i):
print(i)
return torch.full((1,2,2),ord(i))
data=["a","b","c"]
sampler=torch.utils.data.BatchSampler([0,0,1,1,2,2],batch_size=2,drop_last=False)
ds=Dataset(data,Compose([load_data]))
dl=DataLoader(dataset=ds,sampler=sampler)
print(list(dl))
Output:
> E: unsupported type in collate [<torch.utils.data.dataset.Subset object at 0x7fc4fe751f70>].
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/workspace/monai/MONAI_mine/monai/data/utils.py in list_data_collate(batch)
409 else:
--> 410 ret = default_collate(data)
411 if isinstance(ret, MetaObj) and all(isinstance(d, MetaObj) for d in data):
~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
85
---> 86 raise TypeError(default_collate_err_msg_format.format(elem_type))
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'torch.utils.data.dataset.Subset'>
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/tmp/ipykernel_411606/696157708.py in <module>
10 ds=Dataset(data,Compose([load_data]))
11 dl=DataLoader(dataset=ds,sampler=sampler)
---> 12 print(list(dl))
~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/dataloader.py in __next__(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \
~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _next_data(self)
559 def _next_data(self):
560 index = self._next_index() # may raise StopIteration
--> 561 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
562 if self._pin_memory:
563 data = _utils.pin_memory.pin_memory(data)
~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
50 else:
51 data = self.dataset[possibly_batched_index]
---> 52 return self.collate_fn(data)
~/workspace/monai/MONAI_mine/monai/data/utils.py in list_data_collate(batch)
436 )
437 _ = dev_collate(data)
--> 438 raise TypeError(re_str) from re
439
440
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'torch.utils.data.dataset.Subset'>
Expected behavior Dataset should be compatible with Dataloader and produce desired batches. The function used in place of a transform should also print out the raw data "a", "b", "c".
Environment
================================
Printing MONAI config...
================================
MONAI version: 0.9.0rc1+19.gfe5fb747.dirty
Numpy version: 1.21.2
Pytorch version: 1.10.2
MONAI flags: HAS_EXT = True, USE_COMPILED = False
MONAI rev id: fe5fb7475ec84cc5dddd1194b82be706fdf1c9a5
MONAI __file__: /home/localek10/workspace/monai/MONAI_mine/monai/__init__.py
Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 3.2.1
scikit-image version: 0.18.3
Pillow version: 8.4.0
Tensorboard version: 2.6.0
gdown version: 4.2.1
TorchVision version: 0.11.3
tqdm version: 4.62.3
lmdb version: 1.2.1
psutil version: 5.8.0
pandas version: 1.3.5
einops version: 0.4.0
transformers version: 4.14.1
mlflow version: 1.23.1
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.
For details about installing the optional dependencies, please visit:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.3 LTS
Platform: Linux-5.4.0-96-generic-x86_64-with-glibc2.31
Processor: x86_64
Machine: x86_64
Python version: 3.9.7
Process name: python
Command: ['/home/localek10/miniconda3/envs/monai/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/localek10/.local/share/jupyter/runtime/kernel-d0f602db-d30c-4cde-8eff-31b94f28fb4f.json']
Open files: [popenfile(path='/home/localek10/nohup.out', fd=40, position=3147127, mode='a', flags=558081), popenfile(path='/home/localek10/nohup.out', fd=43, position=3147127, mode='a', flags=558081), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite', fd=47, position=132070400, mode='r+', flags=688130), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite', fd=49, position=132389888, mode='r+', flags=688130), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite-journal', fd=64, position=12, mode='r+', flags=688130), popenfile(path='/home/localek10/workspace/BMEISWorkshops/15_CPP/cufile.log', fd=68, position=221, mode='a', flags=33793)]
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [1.1, 1.0, 0.9, 1.0, 0.9, 0.9, 0.9, 0.9, 0.9, 1.0, 1.5, 0.9]
CPU freq. (MHz): 1495
Load avg. in last 1, 5, 15 mins (%): [0.9, 0.8, 0.8]
Disk usage (%): 71.7
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.3
Available memory (GB): 23.4
Used memory (GB): 7.2
================================
Printing GPU config...
================================
Num GPUs: 2
Has CUDA: True
CUDA version: 11.3
cuDNN enabled: True
cuDNN version: 8200
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: NVIDIA TITAN X (Pascal)
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 28
GPU 0 Total memory (GB): 11.9
GPU 0 CUDA capability (maj.min): 6.1
GPU 1 Name: NVIDIA GeForce GTX 980
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 16
GPU 1 Total memory (GB): 3.9
GPU 1 CUDA capability (maj.min): 5.2
Additional context
What should be done, change Dataset to return lists of results rather than Subset or change collate functions?