Metaworld icon indicating copy to clipboard operation
Metaworld copied to clipboard

State random vector uses global RNG instead of per-environment RNG

Open ottofabian opened this issue 2 years ago • 3 comments

I was trying to use your environments as standalone gym tasks without the metalearning aspect. While I do not think this issue conflicts with the design choices made for the metalearning setting, I think it is important to note here.

I create a new standalone environment as described in the readme with:

from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE

env = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=1)

First of all, when reseting this environment, it has no randomized starting configuration, which for the standalone use case is not ideal. Currently, I get around that by setting _freeze_rand_vec = False. Although, I do not like the fact I have to set a private variable. I would propose to allow for randomization by adding a flag to https://github.com/rlworkgroup/metaworld/blob/cfd837e31d65c9d2b62b7240c68a26b04a9166d9/metaworld/envs/mujoco/env_dict.py#L614-L625 e.g. leveraging something such as

 def initialize(env, seed=None, randomize=False): 
     if seed is not None: 
         st0 = np.random.get_state() 
         np.random.seed(seed) 
     super(type(env), env).__init__() 
     env._partially_observable = False 
     env._freeze_rand_vec = False 
     env._set_task_called = True 
     env.reset() 
     env._freeze_rand_vec = not randomize 
     if seed is not None: 
         np.random.set_state(st0) 

However, given there is a workaround for the above issue, my main focus is on this second part. When seeding the environment with randomization, the behavior is still not deterministic. Consider the following example

import metaworld 
import gym
import numpy as np

def make_metaworld(env_id:str, seed=0):
     env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed)
     # setting this avoids generating the same initialization after each reset
     env._freeze_rand_vec = False
     # Set Timelimit based on the maximum allowed path length of the environment
     env = gym.wrappers.TimeLimit(env, max_episode_steps=env.max_path_length)
     # this should not be necessary, but is done to avoid any issues
     env.seed(seed)
     env.action_space.seed(seed)
     env.observation_space.seed(seed)
     env.goal_space.seed(seed)
     
     return env
     
env1 = make_metaworld("assembly-v2", seed=0)
env2 = make_metaworld("assembly-v2", seed=0)

obs1, obs2 = env1.reset(), env2.reset()
assert np.all(obs1 == obs2), f"Observation delta not zero {obs1 - obs2}"

After some digging, the main culprit seems to be this https://github.com/rlworkgroup/metaworld/blob/a0009ed9a208ff9864a5c1368c04c273bb20dd06/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py#L468-L478

The value of rand_vec is not instantiated based on self.np_random but based on the generic np.random. A quick test from my side seemed to support this, and the above example is not failing when replacing the existing code with

rand_vec = self.np_random.uniform(
    self._random_reset_space.low,
    self._random_reset_space.high,
    size=self._random_reset_space.low.size)

However, I have not tested and investigated this in depth, this issue might still be present somewhere else. Given of how the library is suppose to be used, I think this problem simply never occurred before, but should still be of general interest. So far, I have not found a temporary workaround.

Furthermore, in case my approach of using your environments in a standalone fashion is not the intended way, please let me know how to do it correctly.

ottofabian avatar Aug 18 '21 10:08 ottofabian

Hi Otto,

Thanks for documenting this in so much detail. Unfortunately, there is currently no officially supported way of using the environments standalone (except for ML1/MT1), which is partly why this code path is so complicated. Generally, MT1 is recommended, since it will set up a fixed number of observed goals. I think your proposed change (replacing np.random.uniform with self.np_random.uniform) is a good idea. I think this hasn't been as issue so far because we've mostly run Meta-World using garage, which seeds the global RNG instead of the per-environment RNG.

krzentner avatar Aug 19 '21 00:08 krzentner

Is there an update on this?

ottofabian avatar May 24 '22 07:05 ottofabian

Unfortunately MetaWorld only has one active maintainer (me), and I am neither an author of MetaWorld, nor am I currently using it for research, so updates are very much best-effort.

I realized some time ago that naively replacing np.random.uniform with self.np_random.uniform would actually break how Benchmarks are created. However, I have a change now (#370) that I believe should address this use case.

Thank you again for documenting it clearly.

krzentner avatar May 26 '22 05:05 krzentner

As Meta-World ownership has been transferred from the RLWorkGroup to Farama Foundation, I am closing this issue. If there are any questions or requests for features please join our Discord

reginald-mclean avatar Feb 01 '23 15:02 reginald-mclean