Reward for opening a door is given before door is opened.
It looks like reward for opening 'green' door for the first time is 0.1 points but it is given a few frames before door is actually opened on observed scene... Maybe it is related to some delay for showing opened door (as I tried to go backwards and door still became opened after some frames).
Here is the code that shows it.
from obstacle_tower_env import ObstacleTowerEnv
import numpy as np
import cv2
%matplotlib inline
from matplotlib import pyplot as plt
env = ObstacleTowerEnv('./ObstacleTower/obstacletower', retro=False)
moves = [
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 2, 0 ,0],[1, 2, 0 ,0],
[1, 2, 0 ,0],[1, 2, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 1, 0 ,0],[1, 0, 0 ,0],
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],
[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[1, 0, 0 ,0],[0, 2, 0 ,0],[0, 2, 0 ,0],[2, 0, 0 ,0],
[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],[2, 0, 0 ,0],
]
env.seed(0)
env.floor(1)
obs = env.reset()
for i, move in enumerate(moves):
obs, reward, done, info = env.step(move)
print(i, reward)
if i > 18:
plt.imshow(obs[0])
plt.show()
Hi @mortido
This is actually expected behavior. The reward for opening doors is provided when the agent enters within a certain range of a closed door. The reward is provided at the same time that the animation is triggered. While the agent gets the reward immediately, the animation does not finish for a few more frames.
Does it means that there should be some intermediate frames with partially opened door? I've saw that once (just 1 frame) but most of the time doors just appeared to be opened after some delay.
@mortido This may be because every step call is more than a single frame. Currently for Obstacle Tower our step frequency is every 5 frames. Since the door opening is a small number of frames (it opens quickly), you are likely just missing some or all of the door opening frames.
Thank you for you answers. But I'm still a little bit confused. Given
The reward is provided at the same time that the animation is triggered.
and
the door opening is a small number of frames (it opens quickly), you are likely just missing some or all of the door opening frames.
I still have 3-4 steps between reward and door opening and all of them (except 1 sometimes) with door at fully closed state...