lazy_dataset icon indicating copy to clipboard operation
lazy_dataset copied to clipboard

Add TileDataset

Open alexanderwerning opened this issue 1 year ago • 1 comments

TileDataset should be more efficient than concatenating the input dataset for large repetitions (in my case in the 1000s)

alexanderwerning avatar Nov 15 '23 18:11 alexanderwerning

Does the following code works?

import lazy_dataset
ds = lazy_dataset.new([1, 2, 3])
ds = ds.shuffle(reshuffle=True)
ds = ds.tile(4).catch()
list(ds)

We have to handle the combination of non-ordered (e.g. reshuffle), tile, copy(freeze) and indexing. We use copy(freeze) and indexing too often to introduce a breaking change (e.g. prefetch). It should be the same as non-ordered (e.g. reshuffle), tile and iter.

I see two solutions:

  • Use TileDataset only, when the input is ordered.
  • Convert a TileDataset to the old type (multiple datasets with concat)
    • In this case, we should modify the repr/str that the user recognizes this.

boeddeker avatar Nov 15 '23 19:11 boeddeker