nonechucks icon indicating copy to clipboard operation
nonechucks copied to clipboard

KeyError:(<function SafeDataset.__getitem__ at ...>)

Open minushuang opened this issue 6 years ago • 2 comments

Hi, I used to create a dataset use SafeDataset from csv file, but failed with the error of KeyError:(<function SafeDataset.__getitem__ at ...>) Detalis

Traceback (most recent call last):
  File "test.py", line 77, in <module>
    main()
  File "test.py", line 59, in main
    for batch in test_loader:
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/utils.py", line 49, in __call__
    res = cache[key]
KeyError: (<function SafeDataset.__getitem__ at 0x7f81b186b950>, (0,), frozenset())

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/utils.py", line 51, in __call__
    res = cache[key] = self.func(*args, **kw)
  File "/home/hotel_ai/python3/lib/python3.5/site-packages/nonechucks/dataset.py", line 96, in __getitem__
    raise IndexError
IndexError

and here is my code

test_set = ImageSet('./test.csv', test_trainsforms)
test_set = nc.SafeDataset(test_set)

ImageSet code, open image from http source

class ImageSet(data.Dataset):
    def __init__(self, data_txt,data_transforms):
        f = open(data_txt, "r")
        data_list=[]
        #label_list = []
        cnt = 0
        lines = f.readlines()
        for line in lines[1:]:
            cnt += 1
            tmp = line.strip().split(',')
            data_path = tmp[1]
            data_list.append(data_path)
        f.close()
        self.data_list = data_list
        self.transforms = data_transforms


    def __getitem__(self, index):
        url_prefix = 'this is a http-url-prefix such as: http://images.baidu.com/'

        data_path = self.data_list[index]

        file0 = urllib.request.urlopen(url_prefix + data_path)
        image_file0 = io.BytesIO(file0.read())
        data = Image.open(image_file)
        if data.mode != 'RGB':
            data = data.convert("RGB")

        data = self.transforms(data)
        
       return data, data_path
        
    def __len__(self):
        return len(self.data_list)

minushuang avatar Sep 03 '19 12:09 minushuang

sorry, my fault, I used the defalut DataLoader in my code. replaced with SafeDataLoader and solved my problem. but I have another question, the performance seems to be not very good in my case. 215 seconds 5000 images with resnet50

minushuang avatar Sep 03 '19 12:09 minushuang

Can you show me how you are initializing your (Safe)DataLoader? Would be helpful to see if you are using multiple workers, etc.

msamogh avatar Sep 05 '19 11:09 msamogh