firedup
firedup copied to clipboard
Is there any requirements for the env to fit in this repo?
I tried this repo with a simple env
class SimpleEnv(gym.Env):
metadata = {'render.modes': ['human']}
def __init__(self):
super(SimpleEnv, self).__init__()
self.observation_space = spaces.Box(low=0, high=2, shape=(4, 4))
self.action_space = spaces.Discrete(3)
self.reset()
def step(self, action):
ob = self.observation_space.sample()
reward = 1
episode_over = False if random.random()>0.5 else True
return ob, reward, episode_over, {}
def reset(self):
ob = self.observation_space.sample()
return ob
def render(self, mode='human'):
pass
and use it with the policy gradient agent as
env = SimpleEnv
env.seed(0)
ac_kwargs = dict(hidden_sizes=(16,))
agent = vpg(env, ac_kwargs=ac_kwargs)
episode_count = 100
reward = 0
done = False
for i in range(episode_count):
ob = env.reset()
while True:
print(done)
action = agent.act(ob, reward, done)
ob, reward, done, _ = env.step(action)
if done:
break
But when I run this it get: RuntimeError: size mismatch, m1: [1 x 16], m2: [4 x 16] at /opt/conda/conda-bld/pytorch-cpu_1549626403278/work/aten/src/TH/generic/THTensorMath.cpp:940
This seems because of the mismatch of observation space and the Actor-Critic network. But it works well with env provided by the gym. Did I missed something here?
@mrbeann let me try to reproduce it locally and i will get back to you
I was just running into the same problem. It looks like it's caused by
https://github.com/kashif/firedup/blob/7011b0c8eeac5a96f995f6b44428b0334fe567a5/fireup/algos/ppo/ppo.py#L230
where it's discarding the shape (or rather just taking the first dimension). I guess this works with gym envs that have a single dimension in the observation space, but it will break if your observation has more than one dimension.
I see @duncanwerner any idea how i can make it general so it works with gym envs and your one?
I'm still working through it, but I will send a PR if I get it resolved properly.
ah nice!! thank you 🙇