POMDPs.jl
POMDPs.jl copied to clipboard
`action` interface of exploration policies
The exploration policies (https://github.com/JuliaPOMDP/POMDPs.jl/blob/master/lib/POMDPTools/src/Policies/exploration_policies.jl) do not meet the action interface described in the documentation action(::Policy, x) and cannot be used with the simulators directly. Instead they have the interface action(p::EpsGreedyPolicy, on_policy::Policy, k, s).
I was wondering if there is a reason for this?
I don't remember the details, but they are designed to change as the total number of calls (k) increases. i.e. to decay. I think they are used in things like tabular td learning.
(Since they are Policys they should probably also have the action(p, s) function, though it's not immediately obvious how to do that for them.)
I'm definitely open to changing the design.
I think they would need to store k and the policy. They could have an update! function for k and the policy. The policy field could be P where P<:Union{Nothing,Policy} is a template parameter (nothing to use the current action interface).