MLUtils.jl icon indicating copy to clipboard operation
MLUtils.jl copied to clipboard

Port collated batch view from DataLoaders.jl

Open lorenzoh opened this issue 3 years ago • 5 comments

This is the first part of porting functionality from DataLoaders.jl (ref #22).

This includes porting

  • BatchViewCollated from batchview.jl
    • Notes: the BatchDim machinery can probably be dropped if we can assume last dim = batch dim
  • collate from [https://github.com/lorenzoh/DataLoaders.jl/blob/master/src/collate.jl]

batchview.jl also includes an unimported batchsize helper. Do we want to have a proper batchsize?

lorenzoh avatar Feb 01 '22 09:02 lorenzoh

I think collate could be entirely replaced by batch (one of the reasons that motivated me in doing #27)

CarloLucibello avatar Feb 01 '22 11:02 CarloLucibello

Also, do we need something like collate? Why don't we just call getobs(data, idxs)?

CarloLucibello avatar Feb 01 '22 11:02 CarloLucibello

Sometimes you want a batch view that gives you a vector of observations (e.g. different-sized images), and sometimes you'll want to have them as a single array (or recursively collate observations in tuples/dicts...). For this, having something like BatchViewCollated is essential since it allows buffered loading of observations with getobs!.

lorenzoh avatar Feb 01 '22 12:02 lorenzoh

@lorenzoh is this still necessary?

CarloLucibello avatar Jun 28 '22 03:06 CarloLucibello

Yup 👍 This is necessary for porting the buffered batch view from DataLoaders.jl

lorenzoh avatar Jun 28 '22 16:06 lorenzoh