HighwayEnv icon indicating copy to clipboard operation
HighwayEnv copied to clipboard

state shape confuse

Open deadseason opened this issue 2 months ago • 1 comments

Why does KeyError: 0 appear in action=agent. select_action(state[0]["observation"]) in parking-v0?

print(state): (OrderedDict([('observation', array([0. , 0. , 0. , 0. , 0.19928808, 0.97994095])), ('achieved_goal', array([0. , 0. , 0. , 0. , 0.19928808, 0.97994095])), ('desired_goal', array([ 1.400000e-01, -1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, -1.000000e+00]))]), {'speed': 0, 'crashed': False, 'action': array([0.4141715 , 0.30013174], dtype=float32), 'is_success': False}) wwwwww OrderedDict([('observation', array([3.01843447e-06, 3.52705533e-04, 2.08092835e-02, 1.03749232e-01, 1.96656207e-01, 9.80472507e-01])), ('achieved_goal', array([3.01843447e-06, 3.52705533e-04, 2.08092835e-02, 1.03749232e-01, 1.96656207e-01, 9.80472507e-01])), ('desired_goal', array([ 1.400000e-01, -1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, -1.000000e+00]))])

print(state[0]): OrderedDict([('observation', array([0. , 0. , 0. , 0. , 0.98256134, 0.18593873])), ('achieved_goal', array([0. , 0. , 0. , 0. , 0.98256134, 0.18593873])), ('desired_goal', array([1.400000e-01, 1.400000e-01, 0.000000e+00, 0.000000e+00, 6.123234e-17, 1.000000e+00]))]) KeyError: 0

print(state[1]): {'speed': 0, 'crashed': False, 'action': array([0.46403322, 0.34900686], dtype=float32), 'is_success': False}

deadseason avatar Apr 14 '24 13:04 deadseason

not sure exactly how you defined state here, but it looks like it is a tuple containing (obs, info) (for instance, returned by env.reset().

so state[0] should get you the observation, and state[1] the info (or just unpack: obs, info = state, or obs, info = env.reset() directly)

And once you have your observation, it contains the agent 'observation' but also the 'desired_goal' and 'achieved_goal'. You should probably feed the first two (obs and desired goal) to the policy.

eleurent avatar Apr 14 '24 14:04 eleurent