DataLoaders.jl
DataLoaders.jl copied to clipboard
Why no custom collate?
I find them pretty handy in pytorch
Don't see a reason why there can't be. We'll just need to update BatchViewCollated
to accept a user collate function.
As Kyle pointed out, this will not be quite as straightforward if we want to support inplace data loading for custom collate
functions. Below is a sketch of a possible solution, depending on the use case for it.
Currently a batch is recursively defined as either:
- an
AbstractArray
with one dimension being the batch size - a
Tuple
of batches - a
NamedTuple
of batches - a
Dict
of batches
Importantly, getobs!
is a property of the data container, not the BatchViewCollated
. Let's say we have a data container DC
with observations of type O
so we have: getobs(::DC, idx)::O
and getobs!(::O, ::DC, idx)::O
.
The question is what you want to achieve through a custom colaltefn
. If you want to return custom data types as batches, then the following would work:
- have a custom
collatefn
that returns batches of typeB
- define a method of
DataLoaders.obsslices(::B, ::DataLoaders.BatchDim)
that returns an iterator over views of typeO
. For example, ifO
is an array type, then it should return array views.
Of course, if we don't want to support buffering and custom collate functions (as is the case in PyTorch if I'm not mistaken), we could simply make buffered
and collatefn
arguments on DataLoader
mutually exclusive.