FluxTraining.jl
FluxTraining.jl copied to clipboard
Quickstart tutorial broken
The example Training an image classifier currently uses the following code:
xs, ys = (
# convert each image into h*w*1 array of floats
[Float32.(reshape(img, 28, 28, 1)) for img in Flux.Data.MNIST.images()],
# one-hot encode the labels
[Float32.(Flux.onehot(y, 0:9)) for y in Flux.Data.MNIST.labels()],
)
However,
(Project) pkg> st Flux
Status `C:\Users\Dennis Bal\ProjectFolder\Project.toml`
[587475ba] Flux v0.13.0
julia> using Flux
julia> Flux.Data.MNIST
ERROR: UndefVarError: MNIST not defined
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base .\Base.jl:35
[2] top-level scope
@ REPL[16]:1
So the example is broken. As a side note, I think the example would do great by using MLUtils instead of DataLoaders.jl
and MLDataPattern
. Also, Flux imports DataLoader so no need to explicitly import it.
But I take a look at the docs and try to get started. So I make the following code, that works with Flux's base capacities:
julia> using Flux
julia> using Flux: onehotbatch, onecold
julia> using FluxTraining
julia> using MLUtils: flatten, unsqueeze
julia> using MLDatasets
julia> labels = 0:9
0:9
julia> traindata = MNIST.traindata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels));
julia> size.(traindata)
((28, 28, 1, 60000), (10, 60000))
julia> trainloader = DataLoader(traindata, batchsize=128);
julia> validdata = MNIST.testdata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels));
julia> size.(validdata)
((28, 28, 1, 10000), (10, 10000))
julia> validloader = DataLoader(validdata, batchsize=128);
julia> predict = Chain(flatten, Dense(28^2, 10))
Chain(
MLUtils.flatten,
Dense(784 => 10), # 7_850 parameters
)
julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y)
lossfunc (generic function with 1 method)
julia> optimizer=ADAM()
ADAM(0.001, (0.9, 0.999), 1.0e-8, IdDict{Any, Any}())
julia> callbacks = [Metrics(accuracy)]
1-element Vector{Metrics}:
Metrics(Loss(), Metric(Accuracy))
julia> learner = Learner(predict, lossfunc; optimizer, callbacks)
Learner()
At this point, I start checking loss and training with Flux's train!
:
julia> lossfunc(validdata...)
0.7624986f0
julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)
julia> lossfunc(validdata...)
0.11266354f0
julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)
julia> lossfunc(validdata...)
0.08880948f0
julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)
julia> lossfunc(validdata...)
0.0801171f0
Training no problem. However, when I try to train my learner, it seems like a single float is passed to predict
, and not an array:
julia> fit!(learner, 1, (traindata, validdata))
Epoch 1 TrainingPhase() ...
ERROR: MethodError: no method matching flatten(::Float32)
Closest candidates are:
flatten(::AbstractArray) at C:\Users\usrname\.julia\packages\MLUtils\QTRw7\src\utils.jl:424
Stacktrace:
[1] macro expansion
@ C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined]
[2] _pullback(ctx::Zygote.Context, f::typeof(flatten), args::Float32)
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9
[3] macro expansion
@ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
[4] _pullback
@ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
[5] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}, ::Float32)
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[6] _pullback
@ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined]
[7] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Float32)
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[8] _pullback
@ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:54 [inlined]
[9] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[10] _pullback
@ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined]
[11] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}})
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[12] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352
[13] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75
[14] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70
[15] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any})
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:53
[16] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Float32, Float32}})
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:133
[17] step!
@ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:51 [inlined]
[18] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})(#unused#::Function)
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:24
[19] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}}, learner::Learner, phase::TrainingPhase)
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:105
[20] epoch!
@ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined]
[21] fit!(learner::Learner, nepochs::Int64, ::Tuple{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})
@ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:168
[22] top-level scope
@ REPL[51]:1
I am completely stuck as to what goes wrong. Pointers in that regard would be appreciated, but the main issue is making the example functional, and updating the packages used to load data and the utility functions that I take from MLUtils
.
To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?
Thanks for catching this and taking the time to write up a solution! That example is outdated and hasn't been touched in a while.
So the example is broken.
Flux.jl removed the included datasets a while ago, so this needs to be updated to use MLDatasets.jl.
I think the example would do great by using MLUtils instead of DataLoaders.jl and MLDataPattern. Also, Flux imports DataLoader so no need to explicitly import it.
I agree and am working on deprecating DataLoaders.jl since I've added all its functionality to MLUtils.jl. While FluxTraining.jl is agnostic of the kind of data iterator, I agree it would be good to standardize on MLUtils.jl.
However, when I try to train my learner, it seems like a single float is passed to predict, and not an array:
The setup you wrote is correct, but you then passed train_data
to fit!
instead of train_loader
. fit!
expects an iterator over batches so the latter would be correct. Passing the complete train_data
which is a single array, means that the training loop will iterate over the array, thus getting a single float at every iteration.
To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?
This is done for the majority of code in the docs, but not for this page. The reason is that the CI does not have a GPU available to train the model, which is why the source for this page is an executed Jupyter notebook. The problem with this is that it has to be manually rerun.
If you'd like to contribute your proposed changes as a PR, I'd be happy to review it!
It would boil down to:
- replacing MLDataPAttern.jl and DataLoaders.jl with MLUtils.jl
- using MLDatasets.jl to load the MNIST data
- passing the
DataLoader
s tofit!
, instead of the complete arrays - rerunning the notebook
I did not realize that you were the one behind MLUtils
xD Feeling a little dumb for telling you about it now.
However, your proposed fix still does not work for me. The line that defines lossfunc
turns red for me, indicating that it is the where things go wrong, even though I could not see it in the stacktrace:
Code and stacktrace
julia> using Flux
julia> using Flux: onehotbatch, onecold
julia> using FluxTraining
julia> using MLUtils: flatten, unsqueeze
julia> using MLDatasets
julia> labels = 0:9;
julia> function add_channeldim_and_onehotencode_targets(data_and_targets::Tuple)
data = data_and_targets[1]
targets = data_and_targets[2]
return unsqueeze(data, 3), onehotbatch(targets, labels)
end
add_channeldim_and_onehotencode_targets (generic function with 1 method)
julia> #traindata and testdata contain both inputs (pixel values) and targets (correct labels)
julia> traindata = MNIST.traindata(Float32) |> add_channeldim_and_onehotencode_targets;
julia> trainloader = DataLoader(traindata, batchsize=128);
julia> testdata = MNIST.testdata(Float32) |> add_channeldim_and_onehotencode_targets;
julia> testloader = DataLoader(testdata, batchsize=128);
julia> predict = Chain(flatten, Dense(28^2, 10));
julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y)
lossfunc (generic function with 1 method)
julia> optimizer = ADAM();
julia> callbacks = [Metrics(accuracy)];
julia> learner = Learner(predict, lossfunc; optimizer, callbacks)
Learner()
julia> lossfunc(testdata...)
0.7482163f0
julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)
julia> lossfunc(testdata...)
0.111782126f0
julia> FluxTraining.fit!(learner, 1, (trainloader, testloader))
Epoch 1 TrainingPhase() ...
ERROR: DimensionMismatch("A has dimensions (10,784) but B has dimensions (10,128)")
Stacktrace:
[1] gemm_wrapper!(C::Matrix{Float32}, tA::Char, tB::Char, A::Matrix{Float32}, B::Matrix{Float32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
@ LinearAlgebra C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:643
[2] mul!
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:169 [inlined]
[3] mul!
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:275 [inlined]
[4] *
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:160 [inlined]
[5] rrule
@ C:\Users\Dennis Bal\.julia\packages\ChainRules\5p7j5\src\rulesets\Base\arraymath.jl:60 [inlined]
[6] rrule
@ C:\Users\Dennis Bal\.julia\packages\ChainRulesCore\RbX5a\src\rules.jl:134 [inlined]
[7] chain_rrule
@ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\chainrules.jl:217 [inlined]
[8] macro expansion
@ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined]
[9] _pullback
@ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9 [inlined]
[10] _pullback
@ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:160 [inlined]
[11] macro expansion
@ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
[12] _pullback
@ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
[13] _pullback
@ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined]
[14] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Matrix{Float32})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[15] _pullback
@ .\REPL[123]:1 [inlined]
[16] _pullback(::Zygote.Context, ::typeof(lossfunc), ::Matrix{Float32}, ::Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[17] _pullback
@ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:56 [inlined]
[18] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[19] _pullback
@ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined]
[20] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
[21] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352
[22] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75
[23] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70
[24] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any})
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:53
[25] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:133
[26] step!(learner::Learner, phase::TrainingPhase, batch::Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}})
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:51
[27] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}})(#unused#::Function)
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:24
[28] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}}, learner::Learner, phase::TrainingPhase)
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:105
[29] epoch!
@ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined]
[30] fit!(learner::Learner, nepochs::Int64, ::Tuple{DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}})
@ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:168
[31] top-level scope
@ REPL[130]:1
Unlike Flux.jl, FluxTraining.jl separates the loss function and the model, so that model outputs can be stored. That means that the loss function going into the Learner
should be just the Flux loss function, i.e. Learner(predict, Flux.Losses.logitcrossentropy)
.
Then it works fine:
I made some minor adjustments to the data loading code as well, the full code is here:
using Flux
using Flux: onehotbatch, onecold
using FluxTraining
using MLUtils: flatten, unsqueeze
using MLDatasets
const LABELS = 0:9
function preprocess((data, targets))
return unsqueeze(data, 3), onehotbatch(targets, LABELS)
end
traindata = MNIST(Float32, :train)[:] |> preprocess
testdata = MNIST(Float32, :test)[:] |> preprocess
trainloader = DataLoader(traindata, batchsize=128);
testloader = DataLoader(testdata, batchsize=128);
model = Chain(flatten, Dense(28^2, 10));
lossfn = Flux.Losses.logitcrossentropy
optimizer = ADAM()
callbacks = [Metrics(accuracy)]
learner = Learner(predict, lossfn; optimizer, callbacks)
FluxTraining.fit!(learner, 1, (trainloader, testloader))
That makes sense. I always found it weird that I had to define the loss function in terms of the model. Also, big fan of the changes you made. It looks great.
I do have my hands full, but as some point I will find time to update the docs. But I really think that at the very least a functional MWE should be put in the docs before I will find the time to go down your checklist. If nothing else, a temporary warning and a link to this issue would do the trick.
Getting the following error from you code:
julia> using MLDatasets
julia> traindata = MNIST(Float32, :train)[:] |> preprocess
ERROR: MethodError: objects of type Module are not callable
Stacktrace:
[1] top-level scope
@ REPL[135]:1
I will just stick with traindata = MNIST.traindata(Float32) |> preprocess
I think your version of MLDatasets.jl is not up-to-date, so you may wanna update to 0.7.1, i.e. ]add [email protected]
You are right! However, with the latest version the output is of type MNIST<:MLDatasets.SupervisedDataset
, not a tuple. The following works:
function preprocess(data::MNIST)
return unsqueeze(data.features, 3), onehotbatch(data.targets, LABELS)
end
# traindata and testdata contain both inputs (pixel values) and targets (correct labels)
traindata = MNIST(Float32, :train) |> preprocess;
testdata = MNIST(Float32, :test) |> preprocess;