ReinforcementLearning.jl icon indicating copy to clipboard operation
ReinforcementLearning.jl copied to clipboard

Design of run loop and hooks

Open johannes-fischer opened this issue 8 months ago • 8 comments

I am running into limitations of the current design of the run loop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems the push!(hook, stage, policy, env) interface is supposed to be used for custom code, whereas the push!(policy, stage, env[, action]) is supposed to be used internally by RLCore, is this correct? However, the Hooks do not receive the action.

Here are some ideas to address this issue:

  1. I think all calls to push!(agent::Agent, stage::AbstractStage, env[, action]) should not only be used to store information in the Trajectory (which currently is the case) but also forward to the policy by calling push!(agent.policy, stage, env[, action]). This would allow custom policies to add custom logic.

  2. Similarly to push(policy, ::PostActStage, env, action) having an action argument, push!(hook, ::PostActStage, policy, env) could also have an action argument added. This would also allow to use the chosen action within a custom policy hook.

  3. Another hook should be added between plan! and act! to evaluate functions that need the current env state and the action that is being executed. In the PreActStage the action is not known yet and in the PostActState the env is already in the next state, so this is currently not possible. The interior loop could look something like this:

    push!(policy, PreActStage(), env)
    optimise!(policy, PreActStage())
    push!(hook, PreActStage(), policy, env)
    
    action = RLBase.plan!(policy, env)
    
    push!(policy, PostPlanStage(), env, action)          # new
    optimise!(policy, PostPlanStage())                   # new
    push!(hook, PostPlanStage(), policy, env, action)    # new
    
    act!(env, action)
    
    push!(policy, PostActStage(), env, action)
    optimise!(policy, PostActStage())
    push!(hook, PostActStage(), policy, env, action)     # action arg new
    

I can open a pull request it that's an approach you want to follow.

johannes-fischer avatar Feb 03 '25 15:02 johannes-fischer