D4RL icon indicating copy to clipboard operation
D4RL copied to clipboard

[Bug Report] copy.deepcopy(env) causes unexpected behavior

Open acforvs opened this issue 1 year ago • 0 comments

Describe the bug

The behavior of copy.deepcopy(env) is currently undefined. I would expect either an error to be raised in case the environment should not be copied or for the environments to be compatible with deepcopy

Code example For example, when using deepcopy on "antmaze-umaze-v2", the reward_type changes from sparse to dense, and reward becomes negative, which I believe shouldn't be the case (see https://github.com/Farama-Foundation/D4RL/blob/master/d4rl/pointmaze/maze_model.py#L196). However, the .step function works, making it harder to notice the mistake.

import copy

import gym
import d4rl

env = gym.make("antmaze-umaze-v2")
new_env = copy.deepcopy(env)

env.seed(0)
new_env.seed(0)

state = env.reset()
action = env.action_space.sample()
_, reward, _, _ = env.step(action)
print(f"State: {state}") 
print(f"Action: {action}")
print(f"Reward: {reward}")

new_state = new_env.reset()
new_action = new_env.action_space.sample()
_, new_reward, _, _ = new_env.step(new_action)
print(f"New State: {new_state}")
print(f"New Action: {new_action}")
print(f"New Reward: {new_reward}")

print(f"Reward Types: {env.reward_type} vs. {new_env.reward_type}")

Output:

State: [ 0.05968136  0.04524872  0.73911593  0.99811582  0.00298145 -0.0596976
  0.01386086  0.01285718 -0.00742792 -0.01581104 -0.08846538  0.06740718
  0.09790658 -0.06988595 -0.00338617 -0.05037741 -0.0625188   0.05739599
 -0.07765951 -0.07817309  0.05622622  0.12245795  0.00909633  0.08016977
 -0.00340852 -0.11708338  0.03623063 -0.06030991 -0.03213472]
Action: [-0.8769915   0.09569477  0.46591443 -0.03204463 -0.23699978 -0.6913936
  0.5964345  -0.18665828]
Reward: 0.0
New State: [-0.06903731  0.04845185  0.78921383  0.99227966  0.08385488 -0.07643463
  0.05007176 -0.07194152 -0.05970388  0.08519238  0.03174577 -0.09178095
  0.08723601 -0.03991108 -0.02424408  0.20672978  0.10545167  0.06639785
  0.05823002  0.05074512  0.07413391 -0.03038314  0.07790665 -0.12351816
  0.09879504 -0.00317077  0.23729763  0.06375448 -0.05357395]
New Action: [ 0.15439004 -0.4896254  -0.12454828 -0.4938082   0.6774787   0.87535506
 -0.12077015  0.03304163]
New Reward: -8.97261873966396
Reward Types: sparse vs. dense

System Info I am using google colab

  • Gym version: 0.25.2
  • Python version: Python 3.9.16

Additional context Something similar was also discussed here: https://github.com/openai/gym/issues/1863

Checklist

  • [x] I have checked that there is no similar issue in the repo (required)

acforvs avatar Apr 03 '23 19:04 acforvs