MLUtils.jl
MLUtils.jl copied to clipboard
`batchsize=Inf` or something?
It would be nice if you could DataLoader
for one maximal-size batch, without knowing the size of the inputs.
This would mean that a function which loads some data, pre-processes it, and then returns a DataLoader
could easily be used to return the full dataset, in the identical format, as long as it passes the keyword batchsize
along.
Could be batchsize=0
, since -1
already does something special. Although unfortunately 0 is not an error right now.
What is the use-case for this over doing something like DataLoader(data; batchsize=numobs(data))
? Is it that you don't want to get a DataLoader
returned but rather a BatchView(mapobs(f, data); batchsize=numobs(data))
?
The use is functions like this, which load data & make two DataLoader
s with the specified batch size:
https://github.com/FluxML/model-zoo/blob/52420da6fcadf30ae2e190fc77669fe1d255ff10/vision/conv_mnist/conv_mnist.jl#L71-L84
Ah I see! That makes sense when creating multiple DataLoader
s 👍
You could almost use typemax(Int)
for this purpose, apart from this warning:
julia> DataLoader([1 2 3; 4 5 6]; batchsize=99, partial=false) |> collect
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
┌ Warning: Number of observations less than batch-size, decreasing the batch-size to 3
└ @ MLUtils ~/.julia/packages/MLUtils/KcBtS/src/batchview.jl:95
1-element Vector{Matrix{Int64}}:
[1 2 3; 4 5 6]