POMDPs.jl icon indicating copy to clipboard operation
POMDPs.jl copied to clipboard

`action` interface of exploration policies

Open johannes-fischer opened this issue 2 years ago • 2 comments

The exploration policies (https://github.com/JuliaPOMDP/POMDPs.jl/blob/master/lib/POMDPTools/src/Policies/exploration_policies.jl) do not meet the action interface described in the documentation action(::Policy, x) and cannot be used with the simulators directly. Instead they have the interface action(p::EpsGreedyPolicy, on_policy::Policy, k, s).

I was wondering if there is a reason for this?

johannes-fischer avatar May 31 '23 01:05 johannes-fischer

I don't remember the details, but they are designed to change as the total number of calls (k) increases. i.e. to decay. I think they are used in things like tabular td learning.

(Since they are Policys they should probably also have the action(p, s) function, though it's not immediately obvious how to do that for them.)

I'm definitely open to changing the design.

zsunberg avatar Jun 01 '23 05:06 zsunberg

I think they would need to store k and the policy. They could have an update! function for k and the policy. The policy field could be P where P<:Union{Nothing,Policy} is a template parameter (nothing to use the current action interface).

johannes-fischer avatar Jun 01 '23 17:06 johannes-fischer