HighwayEnv
HighwayEnv copied to clipboard
state shape confuse
Why does KeyError: 0 appear in action=agent. select_action(state[0]["observation"]) in parking-v0?
print(state): (OrderedDict([('observation', array([0. , 0. , 0. , 0. , 0.19928808, 0.97994095])), ('achieved_goal', array([0. , 0. , 0. , 0. , 0.19928808, 0.97994095])), ('desired_goal', array([ 1.400000e-01, -1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, -1.000000e+00]))]), {'speed': 0, 'crashed': False, 'action': array([0.4141715 , 0.30013174], dtype=float32), 'is_success': False}) wwwwww OrderedDict([('observation', array([3.01843447e-06, 3.52705533e-04, 2.08092835e-02, 1.03749232e-01, 1.96656207e-01, 9.80472507e-01])), ('achieved_goal', array([3.01843447e-06, 3.52705533e-04, 2.08092835e-02, 1.03749232e-01, 1.96656207e-01, 9.80472507e-01])), ('desired_goal', array([ 1.400000e-01, -1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, -1.000000e+00]))])
print(state[0]): OrderedDict([('observation', array([0. , 0. , 0. , 0. , 0.98256134, 0.18593873])), ('achieved_goal', array([0. , 0. , 0. , 0. , 0.98256134, 0.18593873])), ('desired_goal', array([1.400000e-01, 1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, 1.000000e+00]))]) KeyError: 0
print(state[1]): {'speed': 0, 'crashed': False, 'action': array([0.46403322, 0.34900686], dtype=float32), 'is_success': False}
not sure exactly how you defined state
here, but it looks like it is a tuple containing (obs, info)
(for instance, returned by env.reset()
.
so state[0] should get you the observation, and state[1] the info (or just unpack: obs, info = state
, or obs, info = env.reset()
directly)
And once you have your observation, it contains the agent 'observation'
but also the 'desired_goal'
and 'achieved_goal'
. You should probably feed the first two (obs and desired goal) to the policy.