Hyperopt.jl
Hyperopt.jl copied to clipboard
[FR] propegate more than loss and state
Currently, when doing hyperband, it is allowed to pass on a loss as the first element, and a "state" at the second element. The "state" is what you can use to continue training from. Furthermore, the "state" must have length equal to the number of hyperparameters (EDIT: when the inner sampler is BOHB).
In my case, I am working with neural networks, and I want to do BOHB. I need to propagate all the trainable parameters, which are not part of the hyperparameter space.
So to train neural networks within the framework of this package, I feel like I need to be able to pass on a third element. The first is still the loss, and the second is still the config from the hyperparameterspace, as required to do BOHB. This is what is called "state" in this package. The third thing I want to propagate is the neural network, which is what I in fact will resume training on. The "state" is not sufficient.
If I am missing some existing good way to propagate a neural network through the @hyperopt
framework, let me know. But as far as I can tell, I need 3 pieces of information.
Furthermore, the "state" must have length equal to the number of hyperparameters.
This is not true, the example below adds a third component of an arbitrary type Random.Xoshiro
in the state without problem.
using Hyperopt, Optim, Random
f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2)
hohb = @hyperopt for resources=50, sampler=Hyperband(R=50, η=3, inner=RandomSampler()), a = LinRange(1,5,1800), c = exp10.(LinRange(-1,3,1800))
if !(state === nothing)
(a,c),d = state
end
@show state
res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], SimulatedAnnealing(), Optim.Options(f_calls_limit=round(Int, resources)))
Optim.minimum(res), (Optim.minimizer(res), Random.Xoshiro())
end
plot(hohb)
You could put the whole neural network and the optimizer in the state in the same way.
Sorry, I didn't mean to close the issue, but if the comment above solves your problem, feel free to close it.
Is it not so that (Optim.minimizer(res), Random.Xoshiro())
is then interpreted as an observation::Union{Vector, Tuple}
in the ObservationsRecord
when doing baysean optimization? Here is my attempt at making your example work with Baysean Optimization:
using Hyperopt, Optim, Random
f(a;c=10) = sum(@. 100 + (a-3)^2 + (c-100)^2)
hohb = @hyperopt for resources=50,
sampler=Hyperband(R=50, η=3, inner=BOHB(dims=[Hyperopt.Continuous(), Hyperopt.Continuous()])),
a = LinRange(1,5,1800),
c = exp10.(LinRange(-1,3,1800))
if !(state === nothing)
(a,c),d = state
end
@show state
res = Optim.optimize(x->f(x[1],c=x[2]), [a,c], SimulatedAnnealing(), Optim.Options(f_calls_limit=round(Int, resources)))
Optim.minimum(res), (Optim.minimizer(res), Random.Xoshiro())
end
which to me produces the following error:
ERROR: MethodError: no method matching /(::Xoshiro, ::Int64)
Closest candidates are:
/(::Any, ::ChainRulesCore.AbstractThunk) at C:\Users\Dennis Bal\.julia\packages\ChainRulesCore\RbX5a\src\tangent_types\thunks.jl:33
/(::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}, ::Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}) at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\base\int.jl:93
/(::StridedArray{P}, ::Real) where P<:Period at C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Dates\src\deprecated.jl:44
...
Stacktrace:
[1] _mean(f::typeof(identity), A::Vector{Any}, dims::Colon)
@ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:176
[2] #mean#2
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:164 [inlined]
[3] mean
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:164 [inlined]
[4] _var(A::Vector{Any}, corrected::Bool, mean::Nothing, #unused#::Colon)
@ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:379
[5] #var#15
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:368 [inlined]
[6] _std(A::Vector{Any}, corrected::Bool, mean::Nothing, #unused#::Colon)
@ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:457
[7] #std#18
@ C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:452 [inlined]
[8] std(A::Vector{Any})
@ Statistics C:\Users\Dennis Bal\.julia\juliaup\julia-1.7.1+0~x64\share\julia\stdlib\v1.7\Statistics\src\Statistics.jl:452
[9] _broadcast_getindex_evalf
@ .\broadcast.jl:670 [inlined]
[10] _broadcast_getindex
@ .\broadcast.jl:643 [inlined]
[11] getindex
@ .\broadcast.jl:597 [inlined]
[12] copyto_nonleaf!(dest::Vector{Vector{Float64}}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(std), Tuple{Base.Broadcast.Extruded{Vector{Any}, Tuple{Bool}, Tuple{Int64}}}}, iter::Base.OneTo{Int64}, state::Int64, count::Int64)
@ Base.Broadcast .\broadcast.jl:1055
[13] copy
@ .\broadcast.jl:907 [inlined]
[14] materialize
@ .\broadcast.jl:860 [inlined]
[15] default_bandwidth(observations::Vector{Any})
@ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:184
[16] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, bws::Nothing, mat_observations::Matrix{Any},
candidates::Dict{Int64, Vector})
@ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:133
[17] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, bws::Nothing, observations::Vector{Vector}, candidates::Dict{Int64, Vector})
@ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:109
[18] MultiKDE.KDEMulti(dims::Vector{MultiKDE.DimensionType}, observations::Vector{Vector}, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}})
@ MultiKDE C:\Users\Dennis Bal\.julia\packages\MultiKDE\kncPj\src\kde.jl:101
[19] MultiKDE.KDEMulti(dims::Vector{LatinHypercubeSampling.LHCDimension}, observations::Vector{Vector}, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}})
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:346
[20] MultiKDE.KDEMulti(dim_types::Vector{LatinHypercubeSampling.LHCDimension}, records::Vector{Hyperopt.ObservationsRecord}, min_bandwidth::Float64, candidates::Tuple{LinRange{Float64, Int64}, Vector{Float64}})
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:336
[21] update_KDEs(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"})
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:293
[22] update_observations(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}, rᵢ::Float64, observations::Vector{Tuple{Vector{Float64}, Xoshiro}}, losses::Vector{Float64})
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:280
[23] successive_halving(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}, n::Int64, r::Float64, s::Int64; threads::Bool)
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:154
[24] hyperband(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"}; threads::Bool)
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:125
[25] hyperband
@ C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:110 [inlined]
[26] optimize(ho::Hyperoptimizer{Hyperband, var"##1668###hyperopt_objective#9690"})
@ Hyperopt C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\samplers.jl:98
[27] top-level scope
@ C:\Users\Dennis Bal\.julia\dev\Hyperopt\src\Hyperopt.jl:193
One solution would be to interpret the first N
elements of the state as the observation::Union{Vector, Tuple}
, where N
is the number of hyperparameters being samples. Then the user could return a tuple or vector with the state (which I have an easier time interpreting as config - but in any case, a set of sampled parameters), and simply append whatever element they wish to propagate through the loops and functions. This would allow making use of state[end]
.
In my case, I would append an instance of my custom type NeuralNetwork
which contains all the information needed to resume training, and only use this final element to initialize the state.