data icon indicating copy to clipboard operation
data copied to clipboard

[RFC] Iterating over MapDataPipes

Open SvenDS9 opened this issue 2 years ago • 0 comments

🚀 The feature

For context please read https://github.com/pytorch/data/issues/795 first. Iterating over MapDatapipes is currently inconsistent, we should find a way to resolve this.

Motivation, pitch

source_dp = IterableWrapper([(i, i) for i in range(10)])
map_dp = source_dp.to_map_datapipe()
print(list(map_dp))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

works as expected but

source_dp = IterableWrapper([(i+1, i) for i in range(10)])
map_dp = source_dp.to_map_datapipe()
print(list(map_dp))
> []

does not.

Alternatives

Do we want to overwrite iter for MapDatapipes? We already have to_iter_datapipe. If yes:

  • Should we return the keys or the values?
  • Should we add a warning that this is not the intended way?
  • ...

Additional context

This post explains how python iterates over classes that overwrite getitem but not iter: https://stackoverflow.com/questions/68244987/how-do-dunder-methods-getitem-and-len-provide-iteration

SvenDS9 avatar Feb 28 '23 10:02 SvenDS9