MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

MONAI Dataset objects incompatible with `torch.utils.data.BatchSampler`

Open ericspod opened this issue 3 years ago • 0 comments

Describe the bug The default collate function fails when using a BatchSampler because datasets given lists of indices or slices return Subset instances from __getitem__. This doesn't appear to be the protocol Pytorch expected.

To Reproduce

Run the following:

import torch
from monai.data import Dataset, DataLoader
from monai.transforms import Compose

def load_data(i):
    print(i)
    return torch.full((1,2,2),ord(i))

data=["a","b","c"]
sampler=torch.utils.data.BatchSampler([0,0,1,1,2,2],batch_size=2,drop_last=False)
ds=Dataset(data,Compose([load_data]))
dl=DataLoader(dataset=ds,sampler=sampler)
print(list(dl))

Output:

> E: unsupported type in collate [<torch.utils.data.dataset.Subset object at 0x7fc4fe751f70>].

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/workspace/monai/MONAI_mine/monai/data/utils.py in list_data_collate(batch)
    409         else:
--> 410             ret = default_collate(data)
    411             if isinstance(ret, MetaObj) and all(isinstance(d, MetaObj) for d in data):

~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     85 
---> 86     raise TypeError(default_collate_err_msg_format.format(elem_type))

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'torch.utils.data.dataset.Subset'>

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_411606/696157708.py in <module>
     10 ds=Dataset(data,Compose([load_data]))
     11 dl=DataLoader(dataset=ds,sampler=sampler)
---> 12 print(list(dl))

~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

~/miniconda3/envs/monai/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     50         else:
     51             data = self.dataset[possibly_batched_index]
---> 52         return self.collate_fn(data)

~/workspace/monai/MONAI_mine/monai/data/utils.py in list_data_collate(batch)
    436             )
    437         _ = dev_collate(data)
--> 438         raise TypeError(re_str) from re
    439 
    440 

TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'torch.utils.data.dataset.Subset'>

Expected behavior Dataset should be compatible with Dataloader and produce desired batches. The function used in place of a transform should also print out the raw data "a", "b", "c".

Environment

================================
Printing MONAI config...
================================
MONAI version: 0.9.0rc1+19.gfe5fb747.dirty
Numpy version: 1.21.2
Pytorch version: 1.10.2
MONAI flags: HAS_EXT = True, USE_COMPILED = False
MONAI rev id: fe5fb7475ec84cc5dddd1194b82be706fdf1c9a5
MONAI __file__: /home/localek10/workspace/monai/MONAI_mine/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.8
Nibabel version: 3.2.1
scikit-image version: 0.18.3
Pillow version: 8.4.0
Tensorboard version: 2.6.0
gdown version: 4.2.1
TorchVision version: 0.11.3
tqdm version: 4.62.3
lmdb version: 1.2.1
psutil version: 5.8.0
pandas version: 1.3.5
einops version: 0.4.0
transformers version: 4.14.1
mlflow version: 1.23.1
pynrrd version: NOT INSTALLED or UNKNOWN VERSION.

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: Ubuntu 20.04.3 LTS
Platform: Linux-5.4.0-96-generic-x86_64-with-glibc2.31
Processor: x86_64
Machine: x86_64
Python version: 3.9.7
Process name: python
Command: ['/home/localek10/miniconda3/envs/monai/bin/python', '-m', 'ipykernel_launcher', '-f', '/home/localek10/.local/share/jupyter/runtime/kernel-d0f602db-d30c-4cde-8eff-31b94f28fb4f.json']
Open files: [popenfile(path='/home/localek10/nohup.out', fd=40, position=3147127, mode='a', flags=558081), popenfile(path='/home/localek10/nohup.out', fd=43, position=3147127, mode='a', flags=558081), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite', fd=47, position=132070400, mode='r+', flags=688130), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite', fd=49, position=132389888, mode='r+', flags=688130), popenfile(path='/home/localek10/.ipython/profile_default/history.sqlite-journal', fd=64, position=12, mode='r+', flags=688130), popenfile(path='/home/localek10/workspace/BMEISWorkshops/15_CPP/cufile.log', fd=68, position=221, mode='a', flags=33793)]
Num physical CPUs: 6
Num logical CPUs: 12
Num usable CPUs: 12
CPU usage (%): [1.1, 1.0, 0.9, 1.0, 0.9, 0.9, 0.9, 0.9, 0.9, 1.0, 1.5, 0.9]
CPU freq. (MHz): 1495
Load avg. in last 1, 5, 15 mins (%): [0.9, 0.8, 0.8]
Disk usage (%): 71.7
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.3
Available memory (GB): 23.4
Used memory (GB): 7.2

================================
Printing GPU config...
================================
Num GPUs: 2
Has CUDA: True
CUDA version: 11.3
cuDNN enabled: True
cuDNN version: 8200
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']
GPU 0 Name: NVIDIA TITAN X (Pascal)
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 28
GPU 0 Total memory (GB): 11.9
GPU 0 CUDA capability (maj.min): 6.1
GPU 1 Name: NVIDIA GeForce GTX 980
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 16
GPU 1 Total memory (GB): 3.9
GPU 1 CUDA capability (maj.min): 5.2

Additional context What should be done, change Dataset to return lists of results rather than Subset or change collate functions?

ericspod avatar May 27 '22 14:05 ericspod