FluxTraining.jl icon indicating copy to clipboard operation
FluxTraining.jl copied to clipboard

Quickstart tutorial broken

Open KronosTheLate opened this issue 2 years ago • 8 comments

The example Training an image classifier currently uses the following code:

xs, ys = (
    # convert each image into h*w*1 array of floats 
    [Float32.(reshape(img, 28, 28, 1)) for img in Flux.Data.MNIST.images()],
    # one-hot encode the labels
    [Float32.(Flux.onehot(y, 0:9)) for y in Flux.Data.MNIST.labels()],


(Project) pkg> st Flux
      Status `C:\Users\Dennis Bal\ProjectFolder\Project.toml`
  [587475ba] Flux v0.13.0

julia> using Flux

julia> Flux.Data.MNIST
ERROR: UndefVarError: MNIST not defined
 [1] getproperty(x::Module, f::Symbol)
   @ Base .\Base.jl:35
 [2] top-level scope
   @ REPL[16]:1

So the example is broken. As a side note, I think the example would do great by using MLUtils instead of DataLoaders.jl and MLDataPattern. Also, Flux imports DataLoader so no need to explicitly import it.

But I take a look at the docs and try to get started. So I make the following code, that works with Flux's base capacities:

julia> using Flux

julia> using Flux: onehotbatch, onecold

julia> using FluxTraining

julia> using MLUtils: flatten, unsqueeze

julia> using MLDatasets

julia> labels = 0:9

julia> traindata = MNIST.traindata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels));

julia> size.(traindata)
((28, 28, 1, 60000), (10, 60000))

julia> trainloader = DataLoader(traindata, batchsize=128);

julia> validdata = MNIST.testdata(Float32) |> x->(unsqueeze(x[1], 3), onehotbatch(x[2], labels)); 

julia> size.(validdata)
((28, 28, 1, 10000), (10, 10000))

julia> validloader = DataLoader(validdata, batchsize=128);

julia> predict = Chain(flatten, Dense(28^2, 10))
  Dense(784 => 10),                     # 7_850 parameters

julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y)
lossfunc (generic function with 1 method)

julia> optimizer=ADAM()
ADAM(0.001, (0.9, 0.999), 1.0e-8, IdDict{Any, Any}())

julia> callbacks = [Metrics(accuracy)]
1-element Vector{Metrics}:
 Metrics(Loss(), Metric(Accuracy))

julia> learner = Learner(predict, lossfunc; optimizer, callbacks)

At this point, I start checking loss and training with Flux's train!:

julia> lossfunc(validdata...)

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(validdata...)

Training no problem. However, when I try to train my learner, it seems like a single float is passed to predict, and not an array:

julia> fit!(learner, 1, (traindata, validdata))
Epoch 1 TrainingPhase() ...
ERROR: MethodError: no method matching flatten(::Float32)
Closest candidates are:
  flatten(::AbstractArray) at C:\Users\usrname\.julia\packages\MLUtils\QTRw7\src\utils.jl:424  
  [1] macro expansion
    @ C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined]     
  [2] _pullback(ctx::Zygote.Context, f::typeof(flatten), args::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9        
  [3] macro expansion
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
  [4] _pullback
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
  [5] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}, ::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
  [6] _pullback
    @ C:\Users\usrname\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined]
  [7] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Float32)
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
  [8] _pullback
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:54 [inlined]
  [9] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [10] _pullback
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined]
 [11] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [12] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352       
 [13] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\usrname\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75        
 [14] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:70      
 [15] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:53      
 [16] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Float32, Float32}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:133     
 [17] step!
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:51 [inlined]
 [18] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})(#unused#::Function)
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:24      
 [19] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}}, learner::Learner, phase::TrainingPhase)     
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:105     
 [20] epoch!
    @ C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined]
 [21] fit!(learner::Learner, nepochs::Int64, ::Tuple{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})
    @ FluxTraining C:\Users\usrname\.julia\packages\FluxTraining\iBFSd\src\training.jl:168     
 [22] top-level scope
    @ REPL[51]:1

I am completely stuck as to what goes wrong. Pointers in that regard would be appreciated, but the main issue is making the example functional, and updating the packages used to load data and the utility functions that I take from MLUtils.

To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?

KronosTheLate avatar Jun 16 '22 16:06 KronosTheLate

Thanks for catching this and taking the time to write up a solution! That example is outdated and hasn't been touched in a while.

So the example is broken.

Flux.jl removed the included datasets a while ago, so this needs to be updated to use MLDatasets.jl.

I think the example would do great by using MLUtils instead of DataLoaders.jl and MLDataPattern. Also, Flux imports DataLoader so no need to explicitly import it.

I agree and am working on deprecating DataLoaders.jl since I've added all its functionality to MLUtils.jl. While FluxTraining.jl is agnostic of the kind of data iterator, I agree it would be good to standardize on MLUtils.jl.

However, when I try to train my learner, it seems like a single float is passed to predict, and not an array:

The setup you wrote is correct, but you then passed train_data to fit! instead of train_loader. fit! expects an iterator over batches so the latter would be correct. Passing the complete train_data which is a single array, means that the training loop will iterate over the array, thus getting a single float at every iteration.

To improve the reliability of this package, could doc testing be used to ensure that in the future, the documentation examples actually run?

This is done for the majority of code in the docs, but not for this page. The reason is that the CI does not have a GPU available to train the model, which is why the source for this page is an executed Jupyter notebook. The problem with this is that it has to be manually rerun.

lorenzoh avatar Jun 17 '22 11:06 lorenzoh

If you'd like to contribute your proposed changes as a PR, I'd be happy to review it!

It would boil down to:

  • replacing MLDataPAttern.jl and DataLoaders.jl with MLUtils.jl
  • using MLDatasets.jl to load the MNIST data
  • passing the DataLoaders to fit!, instead of the complete arrays
  • rerunning the notebook

lorenzoh avatar Jun 17 '22 11:06 lorenzoh

I did not realize that you were the one behind MLUtils xD Feeling a little dumb for telling you about it now.

However, your proposed fix still does not work for me. The line that defines lossfunc turns red for me, indicating that it is the where things go wrong, even though I could not see it in the stacktrace:

Code and stacktrace
julia> using Flux

julia> using Flux: onehotbatch, onecold

julia> using FluxTraining

julia> using MLUtils: flatten, unsqueeze

julia> using MLDatasets

julia> labels = 0:9;

julia> function add_channeldim_and_onehotencode_targets(data_and_targets::Tuple)
           data = data_and_targets[1]
           targets = data_and_targets[2]
           return unsqueeze(data, 3), onehotbatch(targets, labels)
add_channeldim_and_onehotencode_targets (generic function with 1 method)

julia> #traindata and testdata contain both inputs (pixel values) and targets (correct labels)    

julia> traindata = MNIST.traindata(Float32) |> add_channeldim_and_onehotencode_targets;

julia> trainloader = DataLoader(traindata, batchsize=128);

julia> testdata = MNIST.testdata(Float32) |> add_channeldim_and_onehotencode_targets;

julia> testloader = DataLoader(testdata, batchsize=128);

julia> predict = Chain(flatten, Dense(28^2, 10));

julia> lossfunc(x, y) = Flux.Losses.logitbinarycrossentropy(predict(x), y)
lossfunc (generic function with 1 method)

julia> optimizer = ADAM();

julia> callbacks = [Metrics(accuracy)];

julia> learner = Learner(predict, lossfunc; optimizer, callbacks)

julia> lossfunc(testdata...)

julia> Flux.train!(lossfunc, Flux.params(predict), trainloader, optimizer)

julia> lossfunc(testdata...)

julia> FluxTraining.fit!(learner, 1, (trainloader, testloader))
Epoch 1 TrainingPhase() ...
ERROR: DimensionMismatch("A has dimensions (10,784) but B has dimensions (10,128)")
  [1] gemm_wrapper!(C::Matrix{Float32}, tA::Char, tB::Char, A::Matrix{Float32}, B::Matrix{Float32}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
    @ LinearAlgebra C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:643
  [2] mul!
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:169 [inlined]
  [3] mul!
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:275 [inlined]
  [4] *
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.2+0~x64\share\julia\stdlib\v1.7\LinearAlgebra\src\matmul.jl:160 [inlined]
  [5] rrule
    @ C:\Users\Dennis Bal\.julia\packages\ChainRules\5p7j5\src\rulesets\Base\arraymath.jl:60 [inlined]
  [6] rrule
    @ C:\Users\Dennis Bal\.julia\packages\ChainRulesCore\RbX5a\src\rules.jl:134 [inlined]
  [7] chain_rrule
    @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\chainrules.jl:217 [inlined]
  [8] macro expansion
    @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0 [inlined]     
  [9] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:9 [inlined]     
 [10] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:160 [inlined]
 [11] macro expansion
    @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
 [12] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:53 [inlined]
 [13] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\Flux\18YZE\src\layers\basic.jl:51 [inlined]
 [14] _pullback(ctx::Zygote.Context, f::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, args::Matrix{Float32})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [15] _pullback
    @ .\REPL[123]:1 [inlined]
 [16] _pullback(::Zygote.Context, ::typeof(lossfunc), ::Matrix{Float32}, ::Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [17] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:56 [inlined]
 [18] _pullback(ctx::Zygote.Context, f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, args::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0        
 [19] _pullback
    @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70 [inlined]
 [20] _pullback(::Zygote.Context, ::FluxTraining.var"#73#74"{FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface2.jl:0
 [21] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:352       
 [22] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ Zygote C:\Users\Dennis Bal\.julia\packages\Zygote\Y6SC4\src\compiler\interface.jl:75        
 [23] _gradient(f::FluxTraining.var"#70#72"{FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, FluxTraining.PropDict{Any}, Learner}, #unused#::ADAM, m::Chain{Tuple{typeof(flatten), Dense{typeof(identity), Matrix{Float32}, Vector{Float32}}}}, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:70      
 [24] (::FluxTraining.var"#69#71"{Learner})(handle::FluxTraining.var"#handlefn#78"{Learner, TrainingPhase}, state::FluxTraining.PropDict{Any})
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:53      
 [25] runstep(stepfn::FluxTraining.var"#69#71"{Learner}, learner::Learner, phase::TrainingPhase, initialstate::NamedTuple{(:xs, :ys), Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}})
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:133     
 [26] step!(learner::Learner, phase::TrainingPhase, batch::Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}})
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:51      
 [27] (::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}})(#unused#::Function)        
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:24      
 [28] runepoch(epochfn::FluxTraining.var"#67#68"{Learner, TrainingPhase, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}}, learner::Learner, phase::TrainingPhase)
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:105     
 [29] epoch!
    @ C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:22 [inlined]
 [30] fit!(learner::Learner, nepochs::Int64, ::Tuple{DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}, DataLoader{Tuple{Array{Float32, 4}, Flux.OneHotArray{UInt32, 10, 1, 2, Vector{UInt32}}}, Random._GLOBAL_RNG}})
    @ FluxTraining C:\Users\Dennis Bal\.julia\packages\FluxTraining\iBFSd\src\training.jl:168     
 [31] top-level scope
    @ REPL[130]:1

KronosTheLate avatar Jun 17 '22 12:06 KronosTheLate

Unlike Flux.jl, FluxTraining.jl separates the loss function and the model, so that model outputs can be stored. That means that the loss function going into the Learner should be just the Flux loss function, i.e. Learner(predict, Flux.Losses.logitcrossentropy).

Then it works fine: image

I made some minor adjustments to the data loading code as well, the full code is here:

using Flux
using Flux: onehotbatch, onecold
using FluxTraining
using MLUtils: flatten, unsqueeze
using MLDatasets

const LABELS = 0:9

function preprocess((data, targets))
    return unsqueeze(data, 3), onehotbatch(targets, LABELS)

traindata = MNIST(Float32, :train)[:] |> preprocess
testdata = MNIST(Float32, :test)[:] |> preprocess

trainloader = DataLoader(traindata, batchsize=128);
testloader = DataLoader(testdata, batchsize=128);

model = Chain(flatten, Dense(28^2, 10));

lossfn = Flux.Losses.logitcrossentropy

optimizer = ADAM()
callbacks = [Metrics(accuracy)]

learner = Learner(predict, lossfn; optimizer, callbacks)

FluxTraining.fit!(learner, 1, (trainloader, testloader))

lorenzoh avatar Jun 17 '22 13:06 lorenzoh

That makes sense. I always found it weird that I had to define the loss function in terms of the model. Also, big fan of the changes you made. It looks great.

I do have my hands full, but as some point I will find time to update the docs. But I really think that at the very least a functional MWE should be put in the docs before I will find the time to go down your checklist. If nothing else, a temporary warning and a link to this issue would do the trick.

KronosTheLate avatar Jun 17 '22 14:06 KronosTheLate

Getting the following error from you code:

julia> using MLDatasets

julia> traindata = MNIST(Float32, :train)[:] |> preprocess
ERROR: MethodError: objects of type Module are not callable
 [1] top-level scope
   @ REPL[135]:1

I will just stick with traindata = MNIST.traindata(Float32) |> preprocess

KronosTheLate avatar Jun 17 '22 14:06 KronosTheLate

I think your version of MLDatasets.jl is not up-to-date, so you may wanna update to 0.7.1, i.e. ]add [email protected]

lorenzoh avatar Jun 17 '22 14:06 lorenzoh

You are right! However, with the latest version the output is of type MNIST<:MLDatasets.SupervisedDataset, not a tuple. The following works:

function preprocess(data::MNIST)
    return unsqueeze(data.features, 3), onehotbatch(data.targets, LABELS)

# traindata and testdata contain both inputs (pixel values) and targets (correct labels)
traindata = MNIST(Float32, :train) |> preprocess;
testdata = MNIST(Float32, :test) |> preprocess;

KronosTheLate avatar Jun 17 '22 14:06 KronosTheLate