habitat-lab icon indicating copy to clipboard operation
habitat-lab copied to clipboard

PPO agent produces non-deterministic results when evaluating the same episode

Open ericchen321 opened this issue 4 years ago • 5 comments
trafficstars

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: master (commit ce397) Habitat-Sim: master (commit 5cb10)

Docs and Tutorials

Did you read the docs? Yes Did you check out the tutorials? Yes

❓ Questions and Help

Problem

Hello, I am trying to evaluate the V2 RGBD agent in one of the Habitat test point-navigation episodes in a similar fashion as ppo_agents.py, but I noticed that the agent produces inconsistent actions when evaluating the same episode, depending on whether

  • the episode was evaluated alone, or
  • evaluated after episodes before it from the same dataset.

I have fixed random seed when initializing the agent, so based on what I understand from this post, the agent should be deterministic. So may I ask why it would produce different actions depending on the order at which I evaluated the episode?

Context

I created my evaluation script based upon ppo_agents.py, and made some changes so it can either evaluate all episodes from a dataset, or select a particular episode to start the evaluation from.

In pseudocode my evaluation process is

# iterate until we find the first episode to evaluate
if not evaluate_all:
 while env.episode_id != ep_id or env.scene_id != sc_id:
   env.reset()

while count_episodes < num_episodes:
 agent.reset()
 observations = env.reset()
 while not env.episode_over:
   action = agent.act(observations)
   observations = env.step(action)

I have called agent.reset() so the agent's decision would not be affected by previous episodes. Also fixed RANDOM_SEED to 7 as in ppo_agents.py.

The episode of interest is from the test scenes data, has episode-id=49 and scene-id=data/datasets/van-gogh-room.glb. I have also confirmed that the envionrment initally always produced deterministic readings, but the agent's action after a couple of steps into the episode became non-deterministic.

ericchen321 avatar Jun 22 '21 22:06 ericchen321

This is because the state of the PRNGs for action noise and sensor noise is different. Those get seeded at simulator/environment creation but not on reset so their state will be a function of the pervious episodes. If you want to fix the random seed based on the episode, doing something like env.seed(hash(env.current_episode.scene_id) + hash(env.current_episode.episode_id)) should work.

erikwijmans avatar Jun 23 '21 02:06 erikwijmans

Hi Erik, thanks for the quick response! That makes sense.

ericchen321 avatar Jun 23 '21 03:06 ericchen321

Sorry for asking help on this again - I have fixed the environment's random seed to 0, but still I'm having the discrepency.

I suspect this being the agent's problem rather than the environment's, because I have compared sensor readings and noticed that as long as the agent was producing the same action, the readings I got from env.step() were always identical. The divergence happened from the 6th step where given the same readings, the RGBD agent produced different actions.

ericchen321 avatar Jun 23 '21 03:06 ericchen321

Yeah, likely an agent issue. Worth reading through PyTorch's docs on determinism: https://pytorch.org/docs/stable/notes/randomness.html

erikwijmans avatar Jun 23 '21 16:06 erikwijmans

Hi Erik, again thanks for looking into this issue. I will look into the doc and see if I can find the source of non-determinism.

ericchen321 avatar Jun 23 '21 22:06 ericchen321