QuickPOMDPs.jl icon indicating copy to clipboard operation
QuickPOMDPs.jl copied to clipboard

QuickPOMDP Interfaces in python do not work: Solver application and isterminal issues

Open afansi opened this issue 5 years ago • 4 comments

Hello Guys,

I have been trying to use the QUICKPOMDPs interfaces with python. While I managed to use the DiscreteExplicitPOMDP interface, I am struggling in using the QuickPOMDP interface in python.

Indeed, here is the list of difficulties I am facing:

  1. While I am able to define an instance of QuickPOMDP object, I can't apply any solver on it, for example SARSOP. I was able to run this solver with the DiscreteExplicitPOMDP object as in tiger.py (so it is not an installation issue). I am globally having this error (it is the same if I use another solver such as QMDPSolver):

========================================

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-5d5ce468204e> in <module>
      1 solver = SARSOPSolver()
----> 2 policy = solve(solver, pomdp)

TypeError: '>' not supported between instances of 'method' and 'int'

=============================================

here is my code:


import julia
from julia.QuickPOMDPs import DiscreteExplicitPOMDP, QuickPOMDP
from julia.POMDPs import solve, pdf
from julia.QMDP import QMDPSolver
from julia.SARSOP import SARSOPSolver
from julia.POMDPPolicies import alphavectors, RandomPolicy
from julia.POMDPModelTools import Deterministic, SparseCat
from julia.POMDPSimulators import stepthrough, HistoryRecorder, eachstep, simulate
from julia.BeliefUpdaters import DiscreteUpdater
from julia import Base
from julia import Random

import pickle
import itertools
import copy
import time
import itertools
from collections import namedtuple
import typing
import random

class POMDPGenerator:
    def __init__(self, seed=1234):
        self.states = ['left', 'right']
        self.actions = ['left', 'right', 'listen']
        self.observations = ['left', 'right']
        
        self.rng = random.Random(seed)
        self.good_obs = .85
        self.init_state = .5
        self.random_obs = .5
  
    
    def stateindex(self, s):
        idx_p = self.states.index(p)
        return (idx_p) #+ 1
    
    def actionindex(self, a):
        idx_a = self.actions.index(a)
        return (idx_a) #+ 1
    
    def obsindex(self, o):
        idx_o = self.observations.index(o)
        return (idx_o) #+ 1
    
    def initialstate_distribution(self):
        #self.init_state = self.rng.random()    
        return SparseCat(self.states, [self.init_state, 1-self.init_state])
    
    def initialstate(self, rng):
        # return Random.rand(rng, self.initialstate_distribution())
        return Random.rand(self.initialstate_distribution())
    
    def transition(self, s, a):
        if a == 'listen':
            sp = s
            return SparseCat([sp], [1.0])
        else: # a door is opened
            return self.initialstate_distribution()
        
    def transition2(self, s, a, sp):
        if a == 'listen':
            if sp == s:
                return 1.0
            else:
                return 0.0
        else: # a door is opened
            #d= self.initialstate_distribution()
            if sp=='left':
                return self.init_state
            else:
                return 1.0-self.init_state
        
    def observation(self, s, a, sp):        
        return self.observation2(a, sp)
        
    def observation2(self, a, sp):
        if a == 'listen':
            if 'left' == sp:
                return SparseCat(['left', 'right'], [self.good_obs, 1.0-self.good_obs])
            else:
                return SparseCat(['left', 'right'], [1.0-self.good_obs, self.good_obs])
        else:
            return SparseCat(['left', 'right'], [self.random_obs, 1.0-self.random_obs])
        
    def observation3(self, a, sp, o):
        if a == 'listen':
            if o == sp:
                return self.good_obs
            else:
                return 1.0-self.good_obs
        else:
            if o == 'left':
                return self.random_obs
            else:
                return 1.0-self.random_obs

            
    def reward(self, s, a):
        if a == 'listen':
            return -1.0
        elif s == a: # the tiger was found
            return -100.0
        else: # the tiger was escaped
            return 10.0
        
    def generate_pomdp(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states,
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
        )
    
    def generate_pomdp2(self, discount=0.95):
        
        return DiscreteExplicitPOMDP(
            
            self.states,
            self.actions,
            self.observations,
            self.transition2,
            self.observation3,
            self.reward,
            discount,
            self.initialstate_distribution(),
        )
    
    def isterminal(self, s):
        return s =='terminal'
    
    def generate_pomdp_with_terminal(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states + ['terminal'],
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
            isterminal=self.isterminal
        )
    
    def generate_pomdp_without_terminal(self, discount=0.95):
        
        return QuickPOMDP(
            initialstate_distribution = self.initialstate_distribution,
            transition = self.transition,
            observation = self.observation,
            reward=self.reward,
            states=self.states + ['terminal'],
            actions=self.actions,
            observations=self.observations,
            initialstate=self.initialstate(Random.AbstractRNG),
            discount=discount,
            stateindex=self.stateindex,
            actionindex=self.actionindex,
            obsindex=self.obsindex,
        )




Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp()

solver = SARSOPSolver()
policy = solve(solver, pomdp)

print('alpha vectors:')
for v in alphavectors(policy):
    print(v)

print()

for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()

  1. Another problem I am facing is that the parameter "isterminal" of the QuickPOMDP is not well handled. Actually when implementing the function isterminal(s) which a return a boolean and use it as the terminal function in the QuickPOMDP iterfacte, I got an error:

==========================================

TypeError: Julia exception: TypeError: non-boolean (PyObject) used in boolean context
Stacktrace:
 [1] iterate(::POMDPSimulators.POMDPSimIterator{(:s, :a, :o),QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},POMDPPolicies.RandomPolicy{Random.MersenneTwister,QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},BeliefUpdaters.NothingUpdater},BeliefUpdaters.NothingUpdater,Random.MersenneTwister,Nothing,String}, ::Tuple{Int64,String,Nothing}) at /home/fansitca/.julia/packages/POMDPSimulators/nMXAP/src/stepthrough.jl:86 (repeats 2 times)
 [2] jlwrap_iterator(::POMDPSimulators.POMDPSimIterator{(:s, :a, :o),QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},POMDPPolicies.RandomPolicy{Random.MersenneTwister,QuickPOMDPs.QuickPOMDP{UUID("0fbc5ae6-f61e-43a8-ac7e-9abfa223a2ec"),String,String,String,NamedTuple{(:stateindex, :isterminal, :obsindex, :states, :observations, :discount, :actions, :initialstate_distribution, :observation, :actionindex, :transition, :reward, :initialstate),Tuple{PyObject,PyObject,PyObject,Array{String,1},Array{String,1},Float64,Array{String,1},PyObject,PyObject,PyObject,PyObject,PyObject,String}}},BeliefUpdaters.NothingUpdater},BeliefUpdaters.NothingUpdater,Random.MersenneTwister,Nothing,String}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyiterator.jl:150
 [3] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyiterator.jl:131
 [4] macro expansion at /home/fansitca/.julia/packages/PyCall/ttONZ/src/exception.jl:81 [inlined]
 [5] __pycall!(::PyObject, ::Ptr{PyCall.PyObject_struct}, ::PyObject, ::Ptr{Nothing}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:44
 [6] _pycall!(::PyObject, ::PyObject, ::Tuple{Array{String,1}}, ::Int64, ::Ptr{Nothing}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:29
 [7] #call#111 at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:11 [inlined]
 [8] (::PyObject)(::Array{String,1}) at /home/fansitca/.julia/packages/PyCall/ttONZ/src/pyfncall.jl:89
 [9] top-level scope at none:0
 [10] eval(::Module, ::Any) at ./boot.jl:319
 [11] exec_options(::Base.JLOptions) at ./client.jl:243
 [12] _start() at ./client.jl:425

=========================================

The code for reproducing that error is the following:

Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp_with_terminal()
policy = RandomPolicy(pomdp)
for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()

Please note that the following code works properly when removing the "isterminal" parameter:

Gen = POMDPGenerator()
pomdp = Gen.generate_pomdp_without_terminal()
policy = RandomPolicy(pomdp)
for step in stepthrough(pomdp, policy, "s,a,o", max_steps=10):
    print(step.s)
    print(step.a)
    print(step.o)
    print()

I am using Julia 1.0.5 (2019-09-09) and here is the list of installed packages:

Dict{String,Union{Nothing, VersionNumber}} with 18 entries:
  "BeliefUpdaters"  => v"0.1.2"
  "POMDPModelTools" => v"0.2.0"
  "Distributions"   => v"0.21.8"
  "QuickPOMDPs"     => v"0.2.0"
  "BasicPOMCP"      => v"0.2.1"
  "PyCall"          => v"1.91.2"
  "QMDP"            => v"0.1.2"
  "Compose"         => v"0.7.4"
  "IJulia"          => v"1.20.2"
  "Colors"          => v"0.9.6"
  "POMDPSimulators" => v"0.3.2"
  "POMDPPolicies"   => v"0.2.1"
  "SARSOP"          => v"0.4.0"
  "StaticArrays"    => v"0.12.1"
  "POMDPToolbox"    => v"0.3.0"
  "POMDPGifs"       => v"0.1.0"
  "POMDPs"          => v"0.8.1"
  "Parameters"      => v"0.12.0"

afansi avatar Nov 28 '19 17:11 afansi

Hi @afansi , thanks for reporting this! I think it should be fairly straightforward to fix. I'm guessing the problem is that QuickPOMDPs is not recognizing that the PyObject is a function. I'll address it as soon as possible after the Thanksgiving holiday.

zsunberg avatar Nov 28 '19 19:11 zsunberg

Working on this... almost done.

zsunberg avatar Nov 29 '19 23:11 zsunberg

@afansi this has been fixed in the quick_pycall branch in #9 . see examples/issue_8.py for some changes that had to be made to the python code. Still deciding if I should merge it because it introduces a dependency on PyCall

zsunberg avatar Nov 30 '19 01:11 zsunberg

Hi @zsunberg , Thanks you. I will give a try right away.

afansi avatar Dec 01 '19 01:12 afansi