nonechucks
nonechucks copied to clipboard
Should SafeDataset drop __getitem__ and inherrit IterableDataset?
I quickly looked under the hood of this library because I needed to handle None values in my own dataset, but felt suspicious that this is trying to do something impossible.
Looking at https://github.com/msamogh/nonechucks/blob/master/nonechucks/dataset.py#L87-L96, I am under the impression that __getitem__ will return the same value for multiple indices. E.g. suppose index 2 is None, then dataset[2] == dataset[3].
Surely that doesn't make sense for a well-behaved map-style dataset?
Alternatively indices could be remapped via a Dict[int,int] for random access.
Yes, this is not the behavior I expected but is indeed what happens.