SDDP.jl icon indicating copy to clipboard operation
SDDP.jl copied to clipboard

Retrieving information each iteration.

Open Richardzhang0416 opened this issue 6 months ago • 4 comments

Hi Oscar,

We are trying to retrieve the following information for a minimisation problem at each iteration:

  1. lower bound
  2. statistical upper bound
  3. Individual sampled scenario costs
  4. Number of scenarios sampled

I tried to set the parameter log_frequency to 1 but it does not print the information each iteration. My understanding is that if this parameter works, information 1 and 2 can be retrieved from the log. If there any simple way to retrieve 3 and 4 as well?

Thanks!

Hongyu

Richardzhang0416 avatar Jun 30 '25 09:06 Richardzhang0416

You're looking for SDDP.train(model; log_every_iteration = true).

You don't need to print the log though.

Just look at model.most_recent_training_results. This can tell you:

https://github.com/odow/SDDP.jl/blob/014e054bd9faefdc42630f6525b50bd26cd5ed0d/src/user_interface.jl#L680-L689

For (3): we don't store the scenario or the individual stage costs in the log

For (4): it is always 1. We never sample more than one scenario per iteration

What, precisely, do you need, when and why? There is the undocumented forward_pass_callback argument to SDDP.train that might work.

odow avatar Jul 01 '25 01:07 odow

julia> using SDDP, HiGHS

julia> model = SDDP.LinearPolicyGraph(;
           stages = 3,
           lower_bound = 0.0,
           optimizer = HiGHS.Optimizer,
       ) do sp, stage
           @variable(
               sp,
               0 <= stored_production <= 100,
               Int,
               SDDP.State,
               initial_value = 0
           )
           @variable(sp, 0 <= production <= 200, Int)
           @variable(sp, overtime >= 0, Int)
           @variable(sp, demand)
           DEMAND = [[100.0], [100.0, 300.0], [100.0, 300.0]]
           SDDP.parameterize(ω -> JuMP.fix(demand, ω), sp, DEMAND[stage])
           @constraint(
               sp,
               stored_production.out ==
               stored_production.in + production + overtime - demand
           )
           @stageobjective(
               sp,
               100 * production + 300 * overtime + 50 * stored_production.out
           )
       end
A policy graph with 3 nodes.
 Node indices: 1, 2, 3


julia> iterations = Any[]
Any[]

julia> function my_callback(result)
           push!(iterations, result)
           return
       end
my_callback (generic function with 1 method)

julia> SDDP.train(model; forward_pass_callback = my_callback)
-------------------------------------------------------------------
         SDDP.jl (c) Oscar Dowson and contributors, 2017-25
-------------------------------------------------------------------
problem
  nodes           : 3
  state variables : 1
  scenarios       : 4.00000e+00
  existing cuts   : false
options
  solver          : serial mode
  risk measure    : SDDP.Expectation()
  sampling scheme : SDDP.InSampleMonteCarlo
subproblem structure
  VariableRef                             : [6, 6]
  AffExpr in MOI.EqualTo{Float64}         : [1, 1]
  VariableRef in MOI.EqualTo{Float64}     : [1, 1]
  VariableRef in MOI.GreaterThan{Float64} : [4, 4]
  VariableRef in MOI.Integer              : [3, 3]
  VariableRef in MOI.LessThan{Float64}    : [2, 3]
numerical stability report
  matrix range     [1e+00, 1e+00]
  objective range  [1e+00, 3e+02]
  bounds range     [1e+02, 2e+02]
  rhs range        [0e+00, 0e+00]
-------------------------------------------------------------------
 iteration    simulation      bound        time (s)     solves  pid
-------------------------------------------------------------------
         1   3.000000e+04  6.250000e+04  1.711211e-01         8   1
        20   6.000000e+04  6.250000e+04  1.145327e+00       172   1
-------------------------------------------------------------------
status         : simulation_stopping
total time (s) : 1.145327e+00
total solves   : 172
best bound     :  6.250000e+04
simulation ci  :  6.450000e+04 ± 9.641401e+03
numeric issues : 0
-------------------------------------------------------------------

julia> iterations
20-element Vector{Any}:
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 0.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 30000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 95000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 40000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 300.0), (3, 100.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 0.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 55000.0)
 (scenario_path = Tuple{Int64, Any}[(1, 100.0), (2, 100.0), (3, 300.0)], sampled_states = [Dict(:stored_production => 100.0), Dict(:stored_production => 100.0), Dict(:stored_production => 0.0)], objective_states = Tuple{}[], belief_states = Tuple{Int64, Dict{Int64, Float64}}[], cumulative_value = 60000.0)

odow avatar Jul 01 '25 03:07 odow

Hi Oscar, thanks for your reply and suggestions. The argument and function from your suggestions are very helpful. Below are some follow up:

For (3): we don't store the scenario or the individual stage costs in the log I think this information is in the callback function output? I am taking a closer look at the call back output and figure out what are the elements related to this. Just to clarify because I might not be very clear in my previous question: we need the cost of each scenario and do not need the costs at each stage.

For (4): it is always 1. We never sample more than one scenario per iteration. Thanks. I see. Is there a way to change some parameter to make it sample multiple scenarios? Or it is hard coded that only one scenario is sampled.

What, precisely, do you need, when and why? There is the undocumented forward_pass_callback argument to SDDP.train that might work. Ok, what we need is: at the end of each SDDP iteration, the lower bound, statistical upper bound, and the number of scenarios sampled in the current iteration. This is of course based on the case where we sample multiple scenarios each iteration.

We need this information to improve the stopping criteria of SDDP.

Thanks!

Best,

Hongyu

Richardzhang0416 avatar Jul 10 '25 10:07 Richardzhang0416

what we need is: at the end of each SDDP iteration, the lower bound,

This is log.bound

statistical upper bound,

this is log.simulation_value. But note that we do not compute or store the statistical upper bound as you might think about it. We store only the total cost of the single scenario that was iterated on the forward pass.

and the number of scenarios sampled in the current iteration

This is only ever one.

Is there a way to change some parameter to make it sample multiple scenarios? Or it is hard coded that only one scenario is sampled.

It's hard-coded. It is only ever one scenario. We do not implement the "classical" algorithm with N forward paths.

You can write your own stopping rule which performs a simulation of the policy to estimate an upper bound. See:

https://github.com/odow/SDDP.jl/blob/c37a4f51ee6fd2f5f8f3bfef75a52c89b19ea91a/src/plugins/stopping_rules.jl#L42-L178

Part of the design of SDDP.jl is that the stopping rules are separate from the training iterations. This tutorial is pretty close to the true algorithm that we implement: https://sddp.dev/stable/explanation/theory_intro/. It isn't useful to think in terms of the Pereira or Shapiro type papers.

See perhaps Section 4 of https://github.com/odow/SDDP.jl/blob/master/papers/policy_graph/preprint.pdf, section 3 of https://github.com/odow/SDDP.jl/blob/master/papers/partially_observable/preprint.pdf

odow avatar Jul 10 '25 22:07 odow