lazy_dataset
lazy_dataset copied to clipboard
Add TileDataset
TileDataset should be more efficient than concatenating the input dataset for large repetitions (in my case in the 1000s)
Does the following code works?
import lazy_dataset
ds = lazy_dataset.new([1, 2, 3])
ds = ds.shuffle(reshuffle=True)
ds = ds.tile(4).catch()
list(ds)
We have to handle the combination of non-ordered (e.g. reshuffle), tile, copy(freeze) and indexing. We use copy(freeze) and indexing too often to introduce a breaking change (e.g. prefetch). It should be the same as non-ordered (e.g. reshuffle), tile and iter.
I see two solutions:
- Use TileDataset only, when the input is ordered.
- Convert a TileDataset to the old type (multiple datasets with concat)
- In this case, we should modify the repr/str that the user recognizes this.