habitat-lab
habitat-lab copied to clipboard
ppo_agents.py not working with pre-trained models (V1 and V2)
Habitat-Lab and Habitat-Sim versions
Habitat-Lab: master (Commit ce397
, May 7, 2021)
Habitat-Sim: master (Commit 5cb10
, May 7, 2021)
Docs and Tutorials
Did you read the docs? Yes
Did you check out the tutorials? Yes
❓ Questions and Help
Context
Hello, we are trying to evaluate pre-trained depth agents (both V1 and V2 models) with the following command:
python habitat_baselines/agents/ppo_agents.py --input-type depth --model-path <path-to-depth-agent> --task-config configs/tasks/pointnav.yaml
. Also we have modified SIMULATOR.AGENT_0.SENSORS
field to [DEPTH_SENSOR]
.
For the pre-trained agents, we used
- the depth agent from the V1 models (and remapped the weights thanks to Erik's script from here), and
-
gibson-depth-best.pth
from the V2 models.
In the first case, we got the following error:
File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: Missing key(s) in state_dict: "net.prev_action_embedding.weight", "net.tgt_embeding.weight", "net.tgt_embeding.bias", "net.visual_encoder.backbone.conv1.0.weight", "net.visual_encoder.backbone.conv1.1.weight", "net.visual_encoder.backbone.conv1.1.bias", "net.visual_encoder.backbone.layer1.0.convs.0.weight", "net.visual_encoder.backbone.layer1.0.convs.1.weight", "net.visual_encoder.backbone.layer1.0.convs.1.bias", "net.visual_encoder.backbone.layer1.0.convs.3.weight", "net.visual_encoder.backbone.layer1.0.convs.4.weight", "net.visual_encoder.backbone.layer1.0.convs.4.bias", "net.visual_encoder.backbone.layer1.1.convs.0.weight", "net.visual_encoder.backbone.layer1.1.convs.1.weight", "net.visual_encoder.backbone.layer1.1.convs.1.bias", "net.visual_encoder.backbone.layer1.1.convs.3.weight", "net.visual_encoder.backbone.layer1.1.convs.4.weight", "net.visual_encoder.backbone.layer1.1.convs.4.bias", "net.visual_encoder.backbone.layer2.0.convs.0.weight", "net.visual_encoder.backbone.layer2.0.convs.1.weight", "net.visual_encoder.backbone.layer2.0.convs.1.bias", "net.visual_encoder.backbone.layer2.0.convs.3.weight", "net.visual_encoder.backbone.layer2.0.convs.4.weight", "net.visual_encoder.backbone.layer2.0.convs.4.bias", "net.visual_encoder.backbone.layer2.0.downsample.0.weight", "net.visual_encoder.backbone.layer2.0.downsample.1.weight", "net.visual_encoder.backbone.layer2.0.downsample.1.bias", "net.visual_encoder.backbone.layer2.1.convs.0.weight", "net.visual_encoder.backbone.layer2.1.convs.1.weight", "net.visual_encoder.backbone.layer2.1.convs.1.bias", "net.visual_encoder.backbone.layer2.1.convs.3.weight", "net.visual_encoder.backbone.layer2.1.convs.4.weight", "net.visual_encoder.backbone.layer2.1.convs.4.bias", "net.visual_encoder.backbone.layer3.0.convs.0.weight", "net.visual_encoder.backbone.layer3.0.convs.1.weight", "net.visual_encoder.backbone.layer3.0.convs.1.bias", "net.visual_encoder.backbone.layer3.0.convs.3.weight", "net.visual_encoder.backbone.layer3.0.convs.4.weight", "net.visual_encoder.backbone.layer3.0.convs.4.bias", "net.visual_encoder.backbone.layer3.0.downsample.0.weight", "net.visual_encoder.backbone.layer3.0.downsample.1.weight", "net.visual_encoder.backbone.layer3.0.downsample.1.bias", "net.visual_encoder.backbone.layer3.1.convs.0.weight", "net.visual_encoder.backbone.layer3.1.convs.1.weight", "net.visual_encoder.backbone.layer3.1.convs.1.bias", "net.visual_encoder.backbone.layer3.1.convs.3.weight", "net.visual_encoder.backbone.layer3.1.convs.4.weight", "net.visual_encoder.backbone.layer3.1.convs.4.bias", "net.visual_encoder.backbone.layer4.0.convs.0.weight", "net.visual_encoder.backbone.layer4.0.convs.1.weight", "net.visual_encoder.backbone.layer4.0.convs.1.bias", "net.visual_encoder.backbone.layer4.0.convs.3.weight", "net.visual_encoder.backbone.layer4.0.convs.4.weight", "net.visual_encoder.backbone.layer4.0.convs.4.bias", "net.visual_encoder.backbone.layer4.0.downsample.0.weight", "net.visual_encoder.backbone.layer4.0.downsample.1.weight", "net.visual_encoder.backbone.layer4.0.downsample.1.bias", "net.visual_encoder.backbone.layer4.1.convs.0.weight", "net.visual_encoder.backbone.layer4.1.convs.1.weight", "net.visual_encoder.backbone.layer4.1.convs.1.bias", "net.visual_encoder.backbone.layer4.1.convs.3.weight", "net.visual_encoder.backbone.layer4.1.convs.4.weight", "net.visual_encoder.backbone.layer4.1.convs.4.bias", "net.visual_encoder.compression.0.weight", "net.visual_encoder.compression.1.weight", "net.visual_encoder.compression.1.bias", "net.visual_fc.1.weight", "net.visual_fc.1.bias". Unexpected key(s) in state_dict: "net.visual_encoder.cnn.0.weight", "net.visual_encoder.cnn.0.bias", "net.visual_encoder.cnn.2.weight", "net.visual_encoder.cnn.2.bias", "net.visual_encoder.cnn.4.weight", "net.visual_encoder.cnn.4.bias", "net.visual_encoder.cnn.6.weight", "net.visual_encoder.cnn.6.bias". size mismatch for net.state_encoder.rnn.weight_ih_l0: copying a param with shape torch.Size([1536, 514]) from checkpoint, the shape in current model is torch.Size([1536, 576]).
In the second case, we got the following error:
Traceback (most recent call last): File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: Unexpected key(s) in state_dict: "net.visual_encoder.running_mean_and_var._mean", "net.visual_encoder.running_mean_and_var._var", "net.visual_encoder.running_mean_and_var._count".
In both cases the errors were from load_state_dict()
and seem to be due to the keys from the saved models not matching the keys expected by PointNavResNetPolicy
, similar to a previous issue we had and resolved thanks to Erik's script.
Question
So may I ask if there are plans to fix PointNavResNetPolicy
or relevant classes so ppo_agents.py
can work again with the pre-trained agents? Or any suggestions on how we can fix them? Thanks in advance for the help.
The first issue is because its the old checkpoints uses an old architecture.
For the second issue, it looks like those checkpoints were trained with input normalization, that normally is only done for rgb and rgbd so I will have to see what happened there. You can fix this for now by setting normalize_visual_inputs
to true here: https://github.com/facebookresearch/habitat-lab/blob/master/habitat_baselines/agents/ppo_agents.py#L84
Hi Erik, thanks for the suggestions!
I have set normalize_visual_inputs
to True
and I got the following error:
Traceback (most recent call last): File "habitat_baselines/agents/ppo_agents.py", line 171, in <module> main() File "habitat_baselines/agents/ppo_agents.py", line 162, in main agent = PPOAgent(agent_config) File "habitat_baselines/agents/ppo_agents.py", line 94, in __init__ for k, v in ckpt["state_dict"].items() File "/home/eric/anaconda3/envs/hab_env_1/lib/python3.6/site-packages/torch-1.8.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1224, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy: size mismatch for net.visual_encoder.running_mean_and_var._mean: copying a param with shape torch.Size([1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1]). size mismatch for net.visual_encoder.running_mean_and_var._var: copying a param with shape torch.Size([1, 1]) from checkpoint, the shape in current model is torch.Size([1, 1, 1, 1]).
I think this error happened because in habitat_baselines/rl/ddppo/policy/running_mean_and_var.py
, class RunningMeanAndVar
has weights _means
and _var
each initialized as tensors with four dimensions, whereas in the saved model each tensor has only two dimensions.
Also I have tried V2 of the RGB and the RGBD agents (Gibson and MP3d) and both worked.