MLUtils.jl icon indicating copy to clipboard operation
MLUtils.jl copied to clipboard

`batchsize=Inf` or something?

Open mcabbott opened this issue 2 years ago • 4 comments

It would be nice if you could DataLoader for one maximal-size batch, without knowing the size of the inputs.

This would mean that a function which loads some data, pre-processes it, and then returns a DataLoader could easily be used to return the full dataset, in the identical format, as long as it passes the keyword batchsize along.

Could be batchsize=0, since -1 already does something special. Although unfortunately 0 is not an error right now.

mcabbott avatar Feb 11 '23 02:02 mcabbott

What is the use-case for this over doing something like DataLoader(data; batchsize=numobs(data))? Is it that you don't want to get a DataLoader returned but rather a BatchView(mapobs(f, data); batchsize=numobs(data))?

lorenzoh avatar Feb 11 '23 08:02 lorenzoh

The use is functions like this, which load data & make two DataLoaders with the specified batch size:

https://github.com/FluxML/model-zoo/blob/52420da6fcadf30ae2e190fc77669fe1d255ff10/vision/conv_mnist/conv_mnist.jl#L71-L84

mcabbott avatar Feb 11 '23 10:02 mcabbott

Ah I see! That makes sense when creating multiple DataLoaders 👍

lorenzoh avatar Feb 11 '23 13:02 lorenzoh

You could almost use typemax(Int) for this purpose, apart from this warning:

julia> DataLoader([1 2 3; 4 5 6]; batchsize=99, partial=false) |> collect
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
1-element Vector{Matrix{Int64}}:
 [1 2 3; 4 5 6]

mcabbott avatar Feb 11 '23 17:02 mcabbott