D4RL
D4RL copied to clipboard
[Bug Report] copy.deepcopy(env) causes unexpected behavior
Describe the bug
The behavior of copy.deepcopy(env)
is currently undefined. I would expect either an error to be raised in case the environment should not be copied or for the environments to be compatible with deepcopy
Code example
For example, when using deepcopy on "antmaze-umaze-v2", the reward_type
changes from sparse to dense, and reward becomes negative, which I believe shouldn't be the case (see https://github.com/Farama-Foundation/D4RL/blob/master/d4rl/pointmaze/maze_model.py#L196). However, the .step
function works, making it harder to notice the mistake.
import copy
import gym
import d4rl
env = gym.make("antmaze-umaze-v2")
new_env = copy.deepcopy(env)
env.seed(0)
new_env.seed(0)
state = env.reset()
action = env.action_space.sample()
_, reward, _, _ = env.step(action)
print(f"State: {state}")
print(f"Action: {action}")
print(f"Reward: {reward}")
new_state = new_env.reset()
new_action = new_env.action_space.sample()
_, new_reward, _, _ = new_env.step(new_action)
print(f"New State: {new_state}")
print(f"New Action: {new_action}")
print(f"New Reward: {new_reward}")
print(f"Reward Types: {env.reward_type} vs. {new_env.reward_type}")
Output:
State: [ 0.05968136 0.04524872 0.73911593 0.99811582 0.00298145 -0.0596976
0.01386086 0.01285718 -0.00742792 -0.01581104 -0.08846538 0.06740718
0.09790658 -0.06988595 -0.00338617 -0.05037741 -0.0625188 0.05739599
-0.07765951 -0.07817309 0.05622622 0.12245795 0.00909633 0.08016977
-0.00340852 -0.11708338 0.03623063 -0.06030991 -0.03213472]
Action: [-0.8769915 0.09569477 0.46591443 -0.03204463 -0.23699978 -0.6913936
0.5964345 -0.18665828]
Reward: 0.0
New State: [-0.06903731 0.04845185 0.78921383 0.99227966 0.08385488 -0.07643463
0.05007176 -0.07194152 -0.05970388 0.08519238 0.03174577 -0.09178095
0.08723601 -0.03991108 -0.02424408 0.20672978 0.10545167 0.06639785
0.05823002 0.05074512 0.07413391 -0.03038314 0.07790665 -0.12351816
0.09879504 -0.00317077 0.23729763 0.06375448 -0.05357395]
New Action: [ 0.15439004 -0.4896254 -0.12454828 -0.4938082 0.6774787 0.87535506
-0.12077015 0.03304163]
New Reward: -8.97261873966396
Reward Types: sparse vs. dense
System Info I am using google colab
- Gym version: 0.25.2
- Python version: Python 3.9.16
Additional context Something similar was also discussed here: https://github.com/openai/gym/issues/1863
Checklist
- [x] I have checked that there is no similar issue in the repo (required)