Hyperopt.jl icon indicating copy to clipboard operation
Hyperopt.jl copied to clipboard

[FR] propegate more than loss and state

Open KronosTheLate opened this issue 2 years ago • 4 comments

Currently, when doing hyperband, it is allowed to pass on a loss as the first element, and a "state" at the second element. The "state" is what you can use to continue training from. Furthermore, the "state" must have length equal to the number of hyperparameters (EDIT: when the inner sampler is BOHB).

In my case, I am working with neural networks, and I want to do BOHB. I need to propagate all the trainable parameters, which are not part of the hyperparameter space.

So to train neural networks within the framework of this package, I feel like I need to be able to pass on a third element. The first is still the loss, and the second is still the config from the hyperparameterspace, as required to do BOHB. This is what is called "state" in this package. The third thing I want to propagate is the neural network, which is what I in fact will resume training on. The "state" is not sufficient.

If I am missing some existing good way to propagate a neural network through the @hyperopt framework, let me know. But as far as I can tell, I need 3 pieces of information.

KronosTheLate avatar Apr 20 '22 11:04 KronosTheLate

Furthermore, the "state" must have length equal to the number of hyperparameters.

This is not true, the example below adds a third component of an arbitrary type Random.Xoshiro in the state without problem.

using Hyperopt, Optim, Random
f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2)
hohb = @hyperopt for resources=50, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), a = LinRange(1,5,1800), c = exp10.(LinRange(-1,3,1800))
    if !(state === nothing)
        (a,c),d = state
    end
    @show state
    res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], SimulatedAnnealing(), Optim.Options(f_calls_limit=round(Int, resources)))
    Optim.minimum(res), (Optim.minimizer(res), Random.Xoshiro())
end
plot(hohb)

You could put the whole neural network and the optimizer in the state in the same way.

baggepinnen avatar Apr 20 '22 12:04 baggepinnen

Sorry, I didn't mean to close the issue, but if the comment above solves your problem, feel free to close it.

baggepinnen avatar Apr 20 '22 12:04 baggepinnen

Is it not so that (Optim.minimizer(res), Random.Xoshiro()) is then interpreted as an observation::Union{Vector, Tuple} in the ObservationsRecord when doing baysean optimization? Here is my attempt at making your example work with Baysean Optimization:

using Hyperopt, Optim, Random
f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2)
hohb = @hyperopt for resources=50, 
    sampler=Hyperband(R=50, η=3, inner=BOHB(dims=[Hyperopt.Continuous(), Hyperopt.Continuous()])), 
    a = LinRange(1,5,1800), 
    c = exp10.(LinRange(-1,3,1800))

    if !(state === nothing)
        (a,c),d = state
    end
    @show state
    res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], SimulatedAnnealing(), Optim.Options(f_calls_limit=round(Int, resources)))
    Optim.minimum(res), (Optim.minimizer(res), Random.Xoshiro())
end

which to me produces the following error:

ERROR: MethodError: no method matching /(::Xoshiro, ::Int64)
Closest candidates are:
  /(::Any, ::ChainRulesCore.AbstractThunk) at C:\Users\Dennis Bal\.julia\packages\ChainRulesCore\RbX5a\src\tangent_types\thunks.jl:33
  /(::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}, ::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}) at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\int.jl:93
  /(::StridedArray{P}, ::Real) where P<:Period at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Dates\src\deprecated.jl:44      
  ...
Stacktrace:
  [1] _mean(f::typeof(identity), A::Vector{Any}, dims::Colon)
    @ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:176
  [2] #mean#2
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:164 [inlined]
  [3] mean
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:164 [inlined]
  [4] _var(A::Vector{Any}, corrected::Bool, mean::Nothing, #unused#::Colon)
    @ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:379
  [5] #var#15
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:368 [inlined]
  [6] _std(A::Vector{Any}, corrected::Bool, mean::Nothing, #unused#::Colon)
    @ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:457
  [7] #std#18
    @ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:452 [inlined]
  [8] std(A::Vector{Any})
    @ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:452
  [9] _broadcast_getindex_evalf
    @ .\broadcast.jl:670 [inlined]
 [10] _broadcast_getindex
    @ .\broadcast.jl:643 [inlined]
 [11] getindex
    @ .\broadcast.jl:597 [inlined]
 [12] copyto_nonleaf!(dest::Vector{Vector{Float64}}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(std), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
    @ Base.Broadcast .\broadcast.jl:1055
 [13] copy
    @ .\broadcast.jl:907 [inlined]
 [14] materialize
    @ .\broadcast.jl:860 [inlined]
 [15] default_bandwidth(observations::Vector{Any})   
    @ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:184
 [16] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, bws::Nothing, mat_observations::Matrix{Any},
 candidates::Dict{Int64, Vector})
    @ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:133
 [17] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, bws::Nothing, observations::Vector{Vector}, candidates::Dict{Int64, Vector})
    @ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:109
 [18] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, observations::Vector{Vector}, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}})      
    @ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:101
 [19] MultiKDE.KDEMulti(dims::Vector{LatinHypercubeSampling.LHCDimension}, observations::Vector{Vector}, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}})
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:346
 [20] MultiKDE.KDEMulti(dim_types::Vector{LatinHypercubeSampling.LHCDimension}, records::Vector{Hyperopt.ObservationsRecord}, min_bandwidth::Float64, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}}) 
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:336
 [21] update_KDEs(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"})
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:293
 [22] update_observations(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}, rᵢ::Float64, observations::Vector{Tuple{Vector{Float64}, Xoshiro}}, losses::Vector{Float64})
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:280
 [23] successive_halving(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}, n::Int64, r::Float64, s::Int64; threads::Bool)
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:154
 [24] hyperband(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}; threads::Bool)      
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:125
 [25] hyperband
    @ C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:110 [inlined]
 [26] optimize(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"})
    @ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:98
 [27] top-level scope
    @ C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\Hyperopt.jl:193

KronosTheLate avatar Apr 20 '22 12:04 KronosTheLate

One solution would be to interpret the first N elements of the state as the observation::Union{Vector, Tuple}, where N is the number of hyperparameters being samples. Then the user could return a tuple or vector with the state (which I have an easier time interpreting as config - but in any case, a set of sampled parameters), and simply append whatever element they wish to propagate through the loops and functions. This would allow making use of state[end].

In my case, I would append an instance of my custom type NeuralNetwork which contains all the information needed to resume training, and only use this final element to initialize the state.

KronosTheLate avatar Apr 20 '22 13:04 KronosTheLate