Retrieving information each iteration.
Hi Oscar,
We are trying to retrieve the following information for a minimisation problem at each iteration:
- lower bound
- statistical upper bound
- Individual sampled scenario costs
- Number of scenarios sampled
I tried to set the parameter log_frequency to 1 but it does not print the information each iteration. My understanding is that if this parameter works, information 1 and 2 can be retrieved from the log. If there any simple way to retrieve 3 and 4 as well?
Thanks!
Hongyu
You're looking for SDDP.train(model; log_every_iteration = true).
You don't need to print the log though.
Just look at model.most_recent_training_results. This can tell you:
https://github.com/odow/SDDP.jl/blob/014e054bd9faefdc42630f6525b50bd26cd5ed0d/src/user_interface.jl#L680-L689
For (3): we don't store the scenario or the individual stage costs in the log
For (4): it is always 1. We never sample more than one scenario per iteration
What, precisely, do you need, when and why? There is the undocumented forward_pass_callback argument to SDDP.train that might work.
julia> using SDDP, HiGHS
julia> model = SDDP.LinearPolicyGraph(;
stages = 3,
lower_bound = 0.0,
optimizer = HiGHS.Optimizer,
) do sp, stage
@variable(
sp,
0 <= stored_production <= 100,
Int,
SDDP.State,
initial_value = 0
)
@variable(sp, 0 <= production <= 200, Int)
@variable(sp, overtime >= 0, Int)
@variable(sp, demand)
DEMAND = [[100.0], [100.0, 300.0], [100.0, 300.0]]
SDDP.parameterize(ω -> JuMP.fix(demand, ω), sp, DEMAND[stage])
@constraint(
sp,
stored_production.out ==
stored_production.in + production + overtime - demand
)
@stageobjective(
sp,
100 * production + 300 * overtime + 50 * stored_production.out
)
end
A policy graph with 3 nodes.
Node indices: 1, 2, 3
julia> iterations = Any[]
Any[]
julia> function my_callback(result)
push!(iterations, result)
return
end
my_callback (generic function with 1 method)
julia> SDDP.train(model; forward_pass_callback = my_callback)
-------------------------------------------------------------------
SDDP.jl (c) Oscar Dowson and contributors, 2017-25
-------------------------------------------------------------------
problem
nodes : 3
state variables : 1
scenarios : 4.00000e+00
existing cuts : false
options
solver : serial mode
risk measure : SDDP.Expectation()
sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
VariableRef : [6, 6]
AffExpr in MOI.EqualTo{Float64} : [1, 1]
VariableRef in MOI.EqualTo{Float64} : [1, 1]
VariableRef in MOI.GreaterThan{Float64} : [4, 4]
VariableRef in MOI.Integer : [3, 3]
VariableRef in MOI.LessThan{Float64} : [2, 3]
numerical stability report
matrix range [1e+00, 1e+00]
objective range [1e+00, 3e+02]
bounds range [1e+02, 2e+02]
rhs range [0e+00, 0e+00]
-------------------------------------------------------------------
iteration simulation bound time (s) solves pid
-------------------------------------------------------------------
1 3.000000e+04 6.250000e+04 1.711211e-01 8 1
20 6.000000e+04 6.250000e+04 1.145327e+00 172 1
-------------------------------------------------------------------
status : simulation_stopping
total time (s) : 1.145327e+00
total solves : 172
best bound : 6.250000e+04
simulation ci : 6.450000e+04 ± 9.641401e+03
numeric issues : 0
-------------------------------------------------------------------
julia> iterations
20-element Vector{Any}:
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 0.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 30000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
(scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
Hi Oscar, thanks for your reply and suggestions. The argument and function from your suggestions are very helpful. Below are some follow up:
For (3): we don't store the scenario or the individual stage costs in the log I think this information is in the callback function output? I am taking a closer look at the call back output and figure out what are the elements related to this. Just to clarify because I might not be very clear in my previous question: we need the cost of each scenario and do not need the costs at each stage.
For (4): it is always 1. We never sample more than one scenario per iteration. Thanks. I see. Is there a way to change some parameter to make it sample multiple scenarios? Or it is hard coded that only one scenario is sampled.
What, precisely, do you need, when and why? There is the undocumented forward_pass_callback argument to SDDP.train that might work. Ok, what we need is: at the end of each SDDP iteration, the lower bound, statistical upper bound, and the number of scenarios sampled in the current iteration. This is of course based on the case where we sample multiple scenarios each iteration.
We need this information to improve the stopping criteria of SDDP.
Thanks!
Best,
Hongyu
what we need is: at the end of each SDDP iteration, the lower bound,
This is log.bound
statistical upper bound,
this is log.simulation_value. But note that we do not compute or store the statistical upper bound as you might think about it. We store only the total cost of the single scenario that was iterated on the forward pass.
and the number of scenarios sampled in the current iteration
This is only ever one.
Is there a way to change some parameter to make it sample multiple scenarios? Or it is hard coded that only one scenario is sampled.
It's hard-coded. It is only ever one scenario. We do not implement the "classical" algorithm with N forward paths.
You can write your own stopping rule which performs a simulation of the policy to estimate an upper bound. See:
https://github.com/odow/SDDP.jl/blob/c37a4f51ee6fd2f5f8f3bfef75a52c89b19ea91a/src/plugins/stopping_rules.jl#L42-L178
Part of the design of SDDP.jl is that the stopping rules are separate from the training iterations. This tutorial is pretty close to the true algorithm that we implement: https://sddp.dev/stable/explanation/theory_intro/. It isn't useful to think in terms of the Pereira or Shapiro type papers.
See perhaps Section 4 of https://github.com/odow/SDDP.jl/blob/master/papers/policy_graph/preprint.pdf, section 3 of https://github.com/odow/SDDP.jl/blob/master/papers/partially_observable/preprint.pdf