HighwayEnv
HighwayEnv copied to clipboard
Reward for each action
Hello,
Is it possible to compute the reward for each action?
As far as I can see, one can execute env.step(action)
to return the rewards after executing the action.
However, I would like to compute the reward before executing the action.
We know that model.predict(obs)
returns the action with the best rewards. But this does not return the value of the reward nor the rewards for the other actions.
However, I saw the usage of _reward
but this only applies to the current state, and thus the previous action
Since the reward of a transition from (s, a) depends on the next state s', there is no way around it: you have to step the environment to simulate the transition and compute the resulting reward. But if you don't want to affect the environment state, you can just clone it first and apply the transition to the copy:
rewards = {}
for action in range(env.action_space.n):
env_copy = copy.deepcopy(env)
obs, reward, done, truncated, info = env_copy.step(action)
rewards[action] = reward