pomdp-py icon indicating copy to clipboard operation
pomdp-py copied to clipboard

MOMDP representation?

Open Hororohoruru opened this issue 10 months ago • 5 comments

Hello,

I would like to know if it is currently possible to create a problem with fully observable state variables and solve them using a .pomdpx file using SARSOP.

Hororohoruru avatar Apr 04 '24 12:04 Hororohoruru

Yes, it is possible. Check out this documentation page.

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

zkytony avatar Apr 04 '24 13:04 zkytony

Sorry, I did not formulate my question well. What I was meant to ask is how should I formulate my code so, when converted to .pomdpx, it represents some state variables as fully observable.

Right now I am working with a model that has access to the time that has passed since the beginning of the operation, and has a maximum number of time steps to act (finite horizon). The way I have approached is by creating states that, on top of their ID (either an int or 'term' for the terminal state), they have also a property t:

class TDState(pomdp_py.State):
    def __init__(self, state_id, time_step):
        self.id = state_id
        self.t = time_step
        self.name = f"s_{state_id}-t_{time_step}"

The methods for __hash__, __eq__, __str__ and __repr__ are defined similarly to the Tiger example in the documentation. However, observations only have an id property. When the observation depends on t, I retrieve it directly from the TDState object in my ObservationModel:

class TDObservationModel(pomdp.ObservationModel)
    def __init__(self, conf_matrix):
        self. observation_matrix = conf_matrix
        self.n_steps, self.n_states, self.n_obs = self.observation_matrix.shape

    def probability(self, observation, next_state, action):
        obs_idx = observation.id
        state_idx = next_state.id
        state_step = next_state.t

        return self.observation_matrix[state_step][state_idx][obs_idx]

The transition model includes the parameter t_max in order to generate a list of all possible states considering the maximum t. Explaining in Tiger terms, Action('listen') uses the ObservationModel to provide an observation and the state transitions deterministically such that T(s, a_listen, s') = 1 if s.id == s'.id and s'.t == s't + 1. If s.t == t_max, the state transitions to the terminal state no matter the action selected (and provides the corresponding terminal observation deterministically). If any action other than listen is selected, the model also transitions to the terminal state. As in the Tiger problem, the state states the same (here the s.id), but the time advances until an horizon t_max.

I would like the time to be fully observable in the produced .pomdpx file, but since you commented:

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

I think the way I am handling it would not accomplish the MOMDP representation. How should I do it instead?

Hororohoruru avatar Apr 05 '24 12:04 Hororohoruru

Follow-up: I tried to convert to .pomdpx with my current problem definition and the file reflects only one state variable, which has a number of states equal to the number of possible state IDs times the possible values of t. In the case of 5 targets and 8 time-steps, I get 41 states of a single state variable (the extra state is the terminal state).

I would like to know how to define my model to have a state variable with 5 values (ID), which is not fully observable, and another state variable with 8 values (time), which would be fully observable.

Hororohoruru avatar Apr 09 '24 14:04 Hororohoruru

I will provide a sketch for the idea.

class State(pomdp_py.SimpleState)
    def __init__(self, target, time_step):
        super().__init__(data=(target, time_step))

class ObservationModel(pomdp_py.ObservationModel):
   def sample(self, next_state, action):
       time_step = next_state.data[1]
       return pomdp_py.SimpleObservation(data=time_step)

This makes time_step observable, but not the target.

zkytony avatar Apr 09 '24 18:04 zkytony

That makes it clear, thank you! I imagine the ObservationModel need to know t_max in order to create a list of all the possible observations.

What I mean is that the target needs to be part of the observation as well.

Hororohoruru avatar Apr 10 '24 08:04 Hororohoruru