hierarchical_morphology_transfer Discriminator hyperparameters

Discriminator hyperparameters

Open chongyi-zheng opened this issue 3 years ago • 3 comments

The paper only contains some of the discriminator hyperparameters

However, there are some other hyperparameters in the README.md, for example

--discrim-decay true
--discrim-online false
--discrim-time-limit 32

So what should I do if I want to reproduce the results in your paper for imitation of AntMaze_Low from PointMaze_Low? I guess the second row in the table is feasible, right?

Oct 06 '20 04:10 chongyi-zheng

All transfers in navigation environments using the discriminator were done using the point mass. Thus, the point mass row contains the correct hyperparameters. As mentioned in the text, we use a decay on the learning rate of the discriminator (hence --discrim-decay true), do not collect online data for the discriminator (--discrim-decay false), and discrim-time-limit refers to the episode length of the imitated agent. For example, the point mass is much faster than the ant, thus it doesn't make sense to collect data from the point mass where it is just sitting at the goal. discrim-time-limit refers to how long the episode lengths are for the point mass during data collection.

Here's the general procedure for reproducing the maze results with the discriminator.

Train the Point mass low level on PointMass_Low (I believe its named something similar)
Train the point mass high level, PointMaze_High using PointMass_Low
Train the Ant low level with the discriminator using data from PointMass_Low
Compose PointMaze high with AntDiscrim_Low in zero-shot manner. This can be done with the composition_test.py script. Note that depending on the type of maze evaluation you want to do, you may need to edit the compose_params function in utils/loader.py.

Oct 06 '20 04:10 jhejna

@jhejna Many thanks for the quick reply!

I found the name of PointMass_Low was PointMassLargeMJ_Low, and I just want to confirm it with you.
I trained a PointMaze_High policy and an Ant_Low policy to do a zero-shot transfer as mentioned in my other question. And I didn't edit the compose_params function as you said. Do you mean that I need to edit it with Ant_Discrim?
I got the Ant sometime stuck in a location during the zero-shot transfer as you can see in this image, do you have any idea for the reason? (Even though the Ant is not overturned) The Ant_Low looks correct
Do I need to always set high-level-skips manually? I think you try to store it here, but it doesn't work now. https://github.com/jhejna/hierarchical_morphology_transfer/blob/14202b6092555d8f5e3939390dbda349b62d31fe/bot_transfer/utils/loader.py#L361
I found some minor bugs in the code https://github.com/jhejna/hierarchical_morphology_transfer/blob/14202b6092555d8f5e3939390dbda349b62d31fe/bot_transfer/utils/cmd_util.py#L100 The type is int, right? https://github.com/jhejna/hierarchical_morphology_transfer/blob/14202b6092555d8f5e3939390dbda349b62d31fe/bot_transfer/utils/tester.py#L84

I got empty images with this code, the following should work https://github.com/jhejna/hierarchical_morphology_transfer/blob/14202b6092555d8f5e3939390dbda349b62d31fe/bot_transfer/utils/tester.py#L87
I updated test_composition function to do onscreen rendering https://github.com/jhejna/hierarchical_morphology_transfer/blob/14202b6092555d8f5e3939390dbda349b62d31fe/bot_transfer/utils/tester.py#L72

def test_composition(low_name, high_name, env_name, g=0, k=None, num_ep=100):
    params = compose_params(low_name, high_name, env_name, k=k)
    model, env = load(high_name, params, best=True)
    print("COMPOSED PARAMS", params)
    print("ENV", env)

    ep_rewards = list()
    rewards = list()
    obs = env.reset()
    if g == 0:
        while True:
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()
            env.render()
    else:
        gif_frames = list()
        for _ in range(g):
            action, _states = model.predict(obs)
            obs, reward, done, info = env.step(action)
            frame = env.render(mode='rgb_array')
            gif_frames.append(frame)
            rewards.append(reward)
            if done:
                ep_rewards.append(sum(rewards))
                print("REWARD", sum(rewards), len(rewards), "Ep to go:", num_ep, "cur avg", np.mean(ep_rewards))
                num_ep -= 1
                rewards = []
                if num_ep == 0:
                    break
                obs = env.reset()

        import imageio
        render_path = os.path.join(RENDERS, 'composition_' + low_name + '.gif')
        os.makedirs(os.path.dirname(render_path), exist_ok=True)
        print("saving to ", render_path)
        imageio.mimsave(render_path, gif_frames[::4], subrectangles=True, duration=0.05)
        print("completed saving")

Oct 07 '20 14:10 chongyi-zheng

Yeah, that should be the correct one.
In compose_params there is a line that disables sampling goals for the maze. This is the difference between the Maze and Maze End evaluations. Depending on the type of evaluation you want to run, you will need to comment / uncomment this. https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/utils/loader.py#L403
Hmmmm. We did observe this once or twice but not to the extent that is seen here. I'm not sure exactly what would be causing this -- perhaps try training the Ant Low level for more than 2.5 million timesteps, then make sure that you are using the "best" policy saved during training. Additionally confirm that contact information is enabled in the environment and that the mujoco_py version is <2.0. I can perhaps investigate this later when I have more time.
The code makes its best guess at what the skip level should be. When running evals, we set this by hand to 35.
Thanks for pointing that out in the parser! As far as rendering goes, this is only meant to be used when debug rendering is enabled for the high level wrapper: https://github.com/jhejna/hierarchical_morphology_transfer/blob/master/bot_transfer/envs/hierarchical.py#L196. It's commented out because it makes everything run really slowly when enabled.

Oct 07 '20 23:10 jhejna

hierarchical_morphology_transfer hierarchical_morphology_transfer copied to clipboard

Discriminator hyperparameters

hierarchical_morphology_transfer
hierarchical_morphology_transfer copied to clipboard