data
data copied to clipboard
[RFC] Iterating over MapDataPipes
🚀 The feature
For context please read https://github.com/pytorch/data/issues/795 first. Iterating over MapDatapipes is currently inconsistent, we should find a way to resolve this.
Motivation, pitch
source_dp = IterableWrapper([(i, i) for i in range(10)])
map_dp = source_dp.to_map_datapipe()
print(list(map_dp))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
works as expected but
source_dp = IterableWrapper([(i+1, i) for i in range(10)])
map_dp = source_dp.to_map_datapipe()
print(list(map_dp))
> []
does not.
Alternatives
Do we want to overwrite iter
for MapDatapipes? We already have to_iter_datapipe
. If yes:
- Should we return the keys or the values?
- Should we add a warning that this is not the intended way?
- ...
Additional context
This post explains how python iterates over classes that overwrite getitem
but not iter
:
https://stackoverflow.com/questions/68244987/how-do-dunder-methods-getitem-and-len-provide-iteration