evogym icon indicating copy to clipboard operation
evogym copied to clipboard

Possible error in BenchmarkBase step function

Open blmussati opened this issue 2 years ago • 1 comments

Hi, I was running some simple tests with BiWalk.

My code is

robot_structure, robot_connections = sample_robot((5, 5))
biwalk = BiWalk(body=robot_structure, connections=robot_connections)
biwalk.default_viewer.track_objects('robot')
for _ in range(500):   
    action = biwalk.action_space.sample()
    ob, rew, done, info = biwalk.step(action)
    biwalk.render(verbose=True)
    if done:
        biwalk.reset()
biwalk.close()

And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase

After some debugging, I realised that the step function of BenchmarkBase does the following:

def step(self, action):

    action_copy = {}

    for robot_name, a in action.items():
        action_copy[robot_name] = a + 1

    return super().step(action_copy)

Where super() is EvoGymBase.

action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line action_copy[robot_name] = a + 1 Each element becomes >= 1.6 and hence the robot doesn't move.

Could someone explain such line? It seems unnecessary and it would crash any baseline environment...

blmussati avatar May 04 '22 12:05 blmussati

Hi, I was running some simple tests with BiWalk.

My code is

robot_structure, robot_connections = sample_robot((5, 5))
biwalk = BiWalk(body=robot_structure, connections=robot_connections)
biwalk.default_viewer.track_objects('robot')
for _ in range(500):   
    action = biwalk.action_space.sample()
    ob, rew, done, info = biwalk.step(action)
    biwalk.render(verbose=True)
    if done:
        biwalk.reset()
biwalk.close()

And I noticed that the robot would stand still. BiWalk is a subclass of GoalBase, which in turn is a subclass of BenchmarkBase

After some debugging, I realised that the step function of BenchmarkBase does the following:

def step(self, action):

    action_copy = {}

    for robot_name, a in action.items():
        action_copy[robot_name] = a + 1

    return super().step(action_copy)

Where super() is EvoGymBase.

action is a numpy array, where each entry is the action for each of the robot's actuators, and it's bound between 0.6 and 1.6. However, because of the line action_copy[robot_name] = a + 1 Each element becomes >= 1.6 and hence the robot doesn't move.

Could someone explain such line? It seems unnecessary and it would crash any baseline environment...

The default activation function used in PPO policy is Tanh(), which means the range of action is limited to [-1, 1], with +1 operation, the output range will be transferred to [0,2]. Then a clip function will finally limit the range to [0.6,1.6].

Yuxing-Wang-THU avatar Jul 16 '22 07:07 Yuxing-Wang-THU