habitat-lab icon indicating copy to clipboard operation
habitat-lab copied to clipboard

ppo_agents.py not working with pre-trained models (V1 and V2)

Open ericchen321 opened this issue 3 years ago • 2 comments

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master (Commit ce397, May 7, 2021)

Habitat-Sim: master (Commit 5cb10, May 7, 2021)

Docs and Tutorials

Did you read the docs? Yes

Did you check out the tutorials? Yes

❓ Questions and Help

Context

Hello, we are trying to evaluate pre-trained depth agents (both V1 and V2 models) with the following command: python habitat_baselines/agents/ppo_agents.py --input-type depth --model-path <path-to-depth-agent> --task-config configs/tasks/pointnav.yaml. Also we have modified SIMULATOR.AGENT_0.SENSORS field to [DEPTH_SENSOR].

For the pre-trained agents, we used

  1. the depth agent from the V1 models (and remapped the weights thanks to Erik's script from here), and
  2. gibson-depth-best.pth from the V2 models.

In the first case, we got the following error: File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: Missing key(s) in state_dict: "net.prev_action_embedding.weight", "net.tgt_embeding.weight", "net.tgt_embeding.bias", "net.visual_encoder.backbone.conv1.0.weight", "net.visual_encoder.backbone.conv1.1.weight", "net.visual_encoder.backbone.conv1.1.bias", "net.visual_encoder.backbone.layer1.0.convs.0.weight", "net.visual_encoder.backbone.layer1.0.convs.1.weight", "net.visual_encoder.backbone.layer1.0.convs.1.bias", "net.visual_encoder.backbone.layer1.0.convs.3.weight", "net.visual_encoder.backbone.layer1.0.convs.4.weight", "net.visual_encoder.backbone.layer1.0.convs.4.bias", "net.visual_encoder.backbone.layer1.1.convs.0.weight", "net.visual_encoder.backbone.layer1.1.convs.1.weight", "net.visual_encoder.backbone.layer1.1.convs.1.bias", "net.visual_encoder.backbone.layer1.1.convs.3.weight", "net.visual_encoder.backbone.layer1.1.convs.4.weight", "net.visual_encoder.backbone.layer1.1.convs.4.bias", "net.visual_encoder.backbone.layer2.0.convs.0.weight", "net.visual_encoder.backbone.layer2.0.convs.1.weight", "net.visual_encoder.backbone.layer2.0.convs.1.bias", "net.visual_encoder.backbone.layer2.0.convs.3.weight", "net.visual_encoder.backbone.layer2.0.convs.4.weight", "net.visual_encoder.backbone.layer2.0.convs.4.bias", "net.visual_encoder.backbone.layer2.0.downsample.0.weight", "net.visual_encoder.backbone.layer2.0.downsample.1.weight", "net.visual_encoder.backbone.layer2.0.downsample.1.bias", "net.visual_encoder.backbone.layer2.1.convs.0.weight", "net.visual_encoder.backbone.layer2.1.convs.1.weight", "net.visual_encoder.backbone.layer2.1.convs.1.bias", "net.visual_encoder.backbone.layer2.1.convs.3.weight", "net.visual_encoder.backbone.layer2.1.convs.4.weight", "net.visual_encoder.backbone.layer2.1.convs.4.bias", "net.visual_encoder.backbone.layer3.0.convs.0.weight", "net.visual_encoder.backbone.layer3.0.convs.1.weight", "net.visual_encoder.backbone.layer3.0.convs.1.bias", "net.visual_encoder.backbone.layer3.0.convs.3.weight", "net.visual_encoder.backbone.layer3.0.convs.4.weight", "net.visual_encoder.backbone.layer3.0.convs.4.bias", "net.visual_encoder.backbone.layer3.0.downsample.0.weight", "net.visual_encoder.backbone.layer3.0.downsample.1.weight", "net.visual_encoder.backbone.layer3.0.downsample.1.bias", "net.visual_encoder.backbone.layer3.1.convs.0.weight", "net.visual_encoder.backbone.layer3.1.convs.1.weight", "net.visual_encoder.backbone.layer3.1.convs.1.bias", "net.visual_encoder.backbone.layer3.1.convs.3.weight", "net.visual_encoder.backbone.layer3.1.convs.4.weight", "net.visual_encoder.backbone.layer3.1.convs.4.bias", "net.visual_encoder.backbone.layer4.0.convs.0.weight", "net.visual_encoder.backbone.layer4.0.convs.1.weight", "net.visual_encoder.backbone.layer4.0.convs.1.bias", "net.visual_encoder.backbone.layer4.0.convs.3.weight", "net.visual_encoder.backbone.layer4.0.convs.4.weight", "net.visual_encoder.backbone.layer4.0.convs.4.bias", "net.visual_encoder.backbone.layer4.0.downsample.0.weight", "net.visual_encoder.backbone.layer4.0.downsample.1.weight", "net.visual_encoder.backbone.layer4.0.downsample.1.bias", "net.visual_encoder.backbone.layer4.1.convs.0.weight", "net.visual_encoder.backbone.layer4.1.convs.1.weight", "net.visual_encoder.backbone.layer4.1.convs.1.bias", "net.visual_encoder.backbone.layer4.1.convs.3.weight", "net.visual_encoder.backbone.layer4.1.convs.4.weight", "net.visual_encoder.backbone.layer4.1.convs.4.bias", "net.visual_encoder.compression.0.weight", "net.visual_encoder.compression.1.weight", "net.visual_encoder.compression.1.bias", "net.visual_fc.1.weight", "net.visual_fc.1.bias". Unexpected key(s) in state_dict: "net.visual_encoder.cnn.0.weight", "net.visual_encoder.cnn.0.bias", "net.visual_encoder.cnn.2.weight", "net.visual_encoder.cnn.2.bias", "net.visual_encoder.cnn.4.weight", "net.visual_encoder.cnn.4.bias", "net.visual_encoder.cnn.6.weight", "net.visual_encoder.cnn.6.bias". size mismatch for net.state_encoder.rnn.weight_ih_l0: copying a param with shape torch.Size([1536, 514]) from checkpoint, the shape in current model is torch.Size([1536, 576]).

In the second case, we got the following error: Traceback (most recent call last): File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: Unexpected key(s) in state_dict: "net.visual_encoder.running_mean_and_var._mean", "net.visual_encoder.running_mean_and_var._var", "net.visual_encoder.running_mean_and_var._count".

In both cases the errors were from load_state_dict() and seem to be due to the keys from the saved models not matching the keys expected by PointNavResNetPolicy, similar to a previous issue we had and resolved thanks to Erik's script.

Question

So may I ask if there are plans to fix PointNavResNetPolicy or relevant classes so ppo_agents.py can work again with the pre-trained agents? Or any suggestions on how we can fix them? Thanks in advance for the help.

ericchen321 avatar May 13 '21 21:05 ericchen321

The first issue is because its the old checkpoints uses an old architecture.

For the second issue, it looks like those checkpoints were trained with input normalization, that normally is only done for rgb and rgbd so I will have to see what happened there. You can fix this for now by setting normalize_visual_inputs to true here: https://github.com/facebookresearch/habitat-lab/blob/master/habitat_baselines/agents/ppo_agents.py#L84

erikwijmans avatar May 14 '21 00:05 erikwijmans

Hi Erik, thanks for the suggestions!

I have set normalize_visual_inputs to True and I got the following error: Traceback (most recent call last): File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: size mismatch for net.visual_encoder.running_mean_and_var._mean: copying a param with shape torch.Size([1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1]). size mismatch for net.visual_encoder.running_mean_and_var._var: copying a param with shape torch.Size([1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1]). I think this error happened because in habitat_baselines/rl/ddppo/policy/running_mean_and_var.py, class RunningMeanAndVar has weights _means and _var each initialized as tensors with four dimensions, whereas in the saved model each tensor has only two dimensions.

Also I have tried V2 of the RGB and the RGBD agents (Gibson and MP3d) and both worked.

ericchen321 avatar May 17 '21 21:05 ericchen321