nonechucks icon indicating copy to clipboard operation
nonechucks copied to clipboard

Pytorch's IterableDataset

Open jungerm2 opened this issue 4 years ago • 2 comments

Hello, I've been using this (excellent) library for a while, and I just stumbled upon a new feature in pytorch. It seems that pytorch now has an IterableDataset class that is meant to solve the exact issues that this library was trying to solve.

Is this correct? I feel like nonechucks is doing more than what can be done with the class, but it seems to me, safe dataloading and transforms as filters can be done with this (provided one's careful with the multithreading).

jungerm2 avatar Apr 18 '20 16:04 jungerm2

Could you give an example (or link) demonstrating how IterableDataset could be used to handle bad samples?

sammlapp avatar Apr 22 '21 18:04 sammlapp

You could just not return (yield rather) the sample if it fails some check, i.e in the __iter__ method:

def __iter__(self):
    for sample in samples:
        if self.is_valid(sample):
            yield sample

That's the rough idea at least!

jungerm2 avatar Apr 30 '21 16:04 jungerm2