HighwayEnv icon indicating copy to clipboard operation
HighwayEnv copied to clipboard

Reward for each action

Open favu100 opened this issue 1 year ago • 1 comments

Hello,

Is it possible to compute the reward for each action?

As far as I can see, one can execute env.step(action) to return the rewards after executing the action. However, I would like to compute the reward before executing the action. We know that model.predict(obs) returns the action with the best rewards. But this does not return the value of the reward nor the rewards for the other actions.

However, I saw the usage of _reward but this only applies to the current state, and thus the previous action

favu100 avatar Nov 26 '23 19:11 favu100

Since the reward of a transition from (s, a) depends on the next state s', there is no way around it: you have to step the environment to simulate the transition and compute the resulting reward. But if you don't want to affect the environment state, you can just clone it first and apply the transition to the copy:

rewards = {}
for action in range(env.action_space.n):
  env_copy = copy.deepcopy(env)
  obs, reward, done, truncated, info = env_copy.step(action)
  rewards[action] = reward

eleurent avatar Nov 27 '23 23:11 eleurent