firedup icon indicating copy to clipboard operation
firedup copied to clipboard

Is there any requirements for the env to fit in this repo?

Open mrbeann opened this issue 5 years ago • 5 comments

I tried this repo with a simple env

class SimpleEnv(gym.Env):
    metadata = {'render.modes': ['human']}

    def __init__(self):
        super(SimpleEnv, self).__init__()
        self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
        self.action_space = spaces.Discrete(3)
        self.reset()

    def step(self, action):
        ob = self.observation_space.sample()
        reward = 1
        episode_over = False if random.random()>0.5 else True
        return ob, reward, episode_over, {}

    def reset(self):
        ob = self.observation_space.sample()
        return ob

    def render(self, mode='human'):
        pass

and use it with the policy gradient agent as

    env = SimpleEnv
    env.seed(0)
    ac_kwargs = dict(hidden_sizes=(16,))
    agent = vpg(env, ac_kwargs=ac_kwargs)
    episode_count = 100
    reward = 0
    done = False

    for i in range(episode_count):
        ob = env.reset()
        while True:
            print(done)
            action = agent.act(ob, reward, done)
            ob, reward, done, _ = env.step(action)
            if done:
                break

But when I run this it get: RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940

This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?

mrbeann avatar Nov 03 '19 14:11 mrbeann

@mrbeann let me try to reproduce it locally and i will get back to you

kashif avatar Nov 04 '19 11:11 kashif

I was just running into the same problem. It looks like it's caused by

https://github.com/kashif/firedup/blob/7011b0c8eeac5a96f995f6b44428b0334fe567a5/fireup/algos/ppo/ppo.py#L230

where it's discarding the shape (or rather just taking the first dimension). I guess this works with gym envs that have a single dimension in the observation space, but it will break if your observation has more than one dimension.

duncanwerner avatar Dec 12 '19 22:12 duncanwerner

I see @duncanwerner any idea how i can make it general so it works with gym envs and your one?

kashif avatar Dec 12 '19 22:12 kashif

I'm still working through it, but I will send a PR if I get it resolved properly.

duncanwerner avatar Dec 12 '19 22:12 duncanwerner

ah nice!! thank you 🙇

kashif avatar Dec 12 '19 22:12 kashif