rlpyt
rlpyt copied to clipboard
Doesn't work with (non-atari) env
It would be super useful for me to see an example of how to use a custom gym environment. Is there an example of this somewhere?
The problem with built-in atari environment is I’m not sure where rlpyt begins and environment ends.
One thing I find a bit confusing is the info_dict. It’s not clear to me at which point I have to wrap it (or does the env wrapper wrap it automatically)?
Let's say we had a simple env like:
class DummyEnv(gym.Env):
def __init__(self):
self.action_space = spaces.Discrete(2)
self.observation_space = spaces.Discrete(10)
def reset(self):
return 0
def step(self, action):
obs, rew, done, info = 0, 1, True, {}
return obs, rew, done, info
what are the steps I would need to take to wrap it?
Not sure this is what you're looking for, just started exploring rlpyt, but in my case I defined a custom env as a class like you did, and then pass it to the serial sampler, the same way is done in example 1:
sampler = SerialSampler(
EnvCls=DummyEnv,
env_kwargs=dict(mode="train", some_other_params=None),
eval_env_kwargs=dict(mode="test", some_other_params=None),
... )
In my case, I returned "None
" as info_dict as it is always empty.
About wrapping the entire env:
Output
env_info
is automatically converted from a dictionary to a corresponding namedtuple, which the rlpyt sampler expects. For this to work, every key that might appear in the gym environmentsenv_info
at any step must appear at the first step after a reset, as theenv_info
entries will have sampler memory pre-allocated for them (so they also cannot change dtype or shape). (seeEnvInfoWrapper
,build_info_tuples
, andinfo_to_nt
in file or more help/details)
https://rlpyt.readthedocs.io/en/latest/pages/env.html#rlpyt.envs.gym.make
Examples on this can be found in files importing this line
from rlpyt.envs.gym import make
Awesome, thanks for the tip.
So, just to be clear, you don't use make? You just return None for info?
Ok, I'm getting this error now if I just pass the env class directly:
agent.initialize(envs[0].spaces, share_memory=False,
AttributeError: 'DummyEnv' object has no attribute 'spaces'
Here is my full source:
import gym
from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.agents.dqn.dqn_agent import DqnAgent
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.envs.gym import GymEnvWrapper
from rlpyt.runners.minibatch_rl import MinibatchRlEval
def main(run_ID, cuda_idx):
agent = DqnAgent()
algo = DQN()
sampler = SerialSampler(
EnvCls=DummyEnv,
env_kwargs={},
batch_T=10, # Timesteps per sample batch
batch_B=1, # Num environments to run in parallel
max_decorrelation_steps=0
)
runner = MinibatchRlEval(
algo=algo,
agent=agent,
sampler=sampler,
n_steps=1000
)
runner.train()
class DummyEnv(gym.Env):
"""
Runs env for 100 steps returning 0 reward except the last step returns 1
"""
def __init__(self):
self.n = 100
self.action_space = gym.spaces.Discrete(2)
self.observation_space = gym.spaces.Discrete(10)
def reset(self):
self.n = 100
return 0
def step(self, action):
obs = 1
self.n -= 1
done = self.n <= 0
rew = 1 if done else 0
# info = {}
return obs, rew, done, None
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--run_ID', help='run identifier (logging)', type=int, default=0)
parser.add_argument('--cuda_idx', help='gpu to use', type=int, default=None)
args = parser.parse_args()
main(
run_ID=args.run_ID,
cuda_idx=args.cuda_idx
)
https://rlpyt.readthedocs.io/en/latest/pages/env.html#rlpyt.envs.gym.make
Examples on this can be found in files importing this line
from rlpyt.envs.gym import make
This seems to work only with environments that are registered with gym. :( https://github.com/astooke/rlpyt/blob/ca6483323c1ec372e9b4ec0ecde47bba620391d8/rlpyt/envs/gym.py#L163
@astooke this should be a pretty simple use case. Could you give me a hint?
Thanks!
Ok, I'm getting this error now if I just pass the env class directly:
agent.initialize(envs[0].spaces, share_memory=False, AttributeError: 'DummyEnv' object has no attribute 'spaces'
I forgot about spaces. Try adding:
# From rlpyt/envs/gym
@property
def spaces(self):
"""Returns the rlpyt spaces for the wrapped env."""
return EnvSpaces(
observation=self.observation_space,
action=self.action_space,
)
To your env, so it can return the spaces dimensions as needed.
class DummyEnv(gym.Env):
"""
Runs env for 100 steps returning 0 reward except the last step returns 1
"""
def __init__(self):
self.n = 100
self.action_space = gym.spaces.Discrete(2)
self.observation_space = gym.spaces.Discrete(10)
# From rlpyt/envs/gym
@property
def spaces(self):
"""Returns the rlpyt spaces for the wrapped env."""
return EnvSpaces(
observation=self.observation_space,
action=self.action_space,
)
def reset(self):
self.n = 100
return 0
def step(self, action):
obs = 1
self.n -= 1
done = self.n <= 0
rew = 1 if done else 0
# info = {}
return obs, rew, done, None
Nope, that doesn't work either ;-(
myproj) andriy@whitelinux:~/Projects/myproj$ python 1_main.py
2020-05-27 14:44:03.023122 | Runner master CPU affinity: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15].
2020-05-27 14:44:03.023173 | Runner master Torch threads: 8.
using seed 2852
Traceback (most recent call last):
File "1_main.py", line 69, in <module>
cuda_idx=args.cuda_idx
File "1_main.py", line 26, in main
runner.train()
File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/runners/minibatch_rl.py", line 301, in train
n_itr = self.startup()
File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/runners/minibatch_rl.py", line 81, in startup
world_size=world_size,
File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/samplers/serial/sampler.py", line 51, in initialize
global_B=global_B, env_ranks=env_ranks)
File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/agents/dqn/dqn_agent.py", line 37, in initialize
global_B=global_B, env_ranks=env_ranks)
File "/home/andriy/Projects/myproj/src/rlpyt/rlpyt/agents/base.py", line 84, in initialize
**self.model_kwargs)
TypeError: 'NoneType' object is not callable
(myproj) andriy@whitelinux:~/Projects/myproj$
here's my full source again:
from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.agents.dqn.dqn_agent import DqnAgent
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.envs.gym import GymEnvWrapper
from rlpyt.runners.minibatch_rl import MinibatchRlEval
from rlpyt.envs.gym import EnvSpaces
def main(run_ID, cuda_idx):
agent = DqnAgent()
algo = DQN()
sampler = SerialSampler(
EnvCls=DummyEnv,
env_kwargs={},
batch_T=10, # Timesteps per sample batch
batch_B=1, # Num environments to run in parallel
max_decorrelation_steps=0
)
runner = MinibatchRlEval(
algo=algo,
agent=agent,
sampler=sampler,
n_steps=1000
)
runner.train()
import gym
class DummyEnv(gym.Env):
"""
Runs env for 100 steps returning 0 reward except the last step returns 1
"""
def __init__(self):
self.n = 100
self.action_space = gym.spaces.Discrete(2)
self.observation_space = gym.spaces.Discrete(10)
# From rlpyt/envs/gym
@property
def spaces(self):
"""Returns the rlpyt spaces for the wrapped env."""
return EnvSpaces(
observation=self.observation_space,
action=self.action_space,
)
def reset(self):
self.n = 100
return 0
def step(self, action):
obs = 1
self.n -= 1
done = self.n <= 0
rew = 1 if done else 0
return obs, rew, done, None
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--run_ID', help='run identifier (logging)', type=int, default=0)
parser.add_argument('--cuda_idx', help='gpu to use', type=int, default=None)
args = parser.parse_args()
main(
run_ID=args.run_ID,
cuda_idx=args.cuda_idx
)
Hello, Im trying to get the recent release of the Nethack gym environment by Facebook working in the rlpyt framework but having issues as well. Wondering if anyone has experimented with this env. thanks
Actually if we're modify example_3.py to work with custom gym envs so we'll see that only serial sampler works as expected. All parallel samplers fails since they try to initialize 'info' globals in base process, but try to receive it from childs globals, wich is empty.
Bump
I have the same problem with a simple example based on one of the examples in the repo. It'd be good to have more documentation on how to do this (if it works). I've tried different other combinations of methods without success such as using gym_make
directly.
from rlpyt.samplers.serial.sampler import SerialSampler
from rlpyt.algos.dqn.dqn import DQN
from rlpyt.agents.dqn.catdqn_agent import CatDqnAgent
from rlpyt.runners.minibatch_rl import MinibatchRlEval
import gym
from rlpyt.envs.gym import GymEnvWrapper
def make_env(game):
return GymEnvWrapper(gym.make(game))
sampler = SerialSampler(
EnvCls=make_env,
env_kwargs={'game': 'CartPole-v1'},
batch_T=1,
batch_B=1,
)
algo = DQN(min_steps_learn=1e3)
agent = CatDqnAgent()
runner = MinibatchRlEval(
algo=algo,
agent=agent,
sampler=sampler,
n_steps=500,
)
config = dict(game=game)
runner.train()
2020-06-17 16:50:34.878147 | dqn_pong_0 dqn_pong_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 Runner master CPU affinity: [0, 1, 2, 3, 4, 5].
2020-06-17 16:50:34.880999 | dqn_pong_0 dqn_pong_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 dqn_CartPole-v1_0 Runner master Torch threads: 3.
using seed 3474
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-6a9a56aa8b66> in <module>
38 )
39 config = dict(game=game)
---> 40 runner.train()
~/anaconda3/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py in train(self)
299 specified log interval.
300 """
--> 301 n_itr = self.startup()
302 with logger.prefix(f"itr #0 "):
303 eval_traj_infos, eval_time = self.evaluate_agent(0)
~/anaconda3/lib/python3.7/site-packages/rlpyt/runners/minibatch_rl.py in startup(self)
79 traj_info_kwargs=self.get_traj_info_kwargs(),
80 rank=rank,
---> 81 world_size=world_size,
82 )
83 self.itr_batch_size = self.sampler.batch_spec.size * world_size
~/anaconda3/lib/python3.7/site-packages/rlpyt/samplers/serial/sampler.py in initialize(self, agent, affinity, seed, bootstrap_value, traj_info_kwargs, rank, world_size)
49 env_ranks = list(range(rank * B, (rank + 1) * B))
50 agent.initialize(envs[0].spaces, share_memory=False,
---> 51 global_B=global_B, env_ranks=env_ranks)
52 samples_pyt, samples_np, examples = build_samples_buffer(agent, envs[0],
53 self.batch_spec, bootstrap_value, agent_shared=False,
~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/dqn/catdqn_agent.py in initialize(self, env_spaces, share_memory, global_B, env_ranks)
21 def initialize(self, env_spaces, share_memory=False,
22 global_B=1, env_ranks=None):
---> 23 super().initialize(env_spaces, share_memory, global_B, env_ranks)
24 # Overwrite distribution.
25 self.distribution = CategoricalEpsilonGreedy(dim=env_spaces.action.n,
~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/dqn/dqn_agent.py in initialize(self, env_spaces, share_memory, global_B, env_ranks)
35 environment instance."""
36 super().initialize(env_spaces, share_memory,
---> 37 global_B=global_B, env_ranks=env_ranks)
38 self.target_model = self.ModelCls(**self.env_model_kwargs,
39 **self.model_kwargs)
~/anaconda3/lib/python3.7/site-packages/rlpyt/agents/base.py in initialize(self, env_spaces, share_memory, **kwargs)
82 self.env_model_kwargs = self.make_env_to_model_kwargs(env_spaces)
83 self.model = self.ModelCls(**self.env_model_kwargs,
---> 84 **self.model_kwargs)
85 if share_memory:
86 self.model.share_memory()
TypeError: 'NoneType' object is not callable
Having same issue above: 'NoneType' object is not callable
This issue is a month old. It would benefit new folks like me who are interested in adopting rlpynt and use non-Atari Gym. So, can someone please give a simple, yet complete, Cartpole-v1 example?
Thanks!
Hi! Sorry for the long absence...let me try to help sort through these...
@benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a self.ModelCls
to call. The CatDqnAgent
doesn't come with one of these, but the AtariCatDqnAgent
is an example that has the model specific to the Atari environment.
@drozzy @LecJackS If you are making a custom env, it is better to just use the rlpyt base env class (https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/base.py), and follow that interface. No need to go through gym. The main difference is that the env_info
that your environment returns should be a namedtuple
, not a dict
, and the entries should be scalars or numpy arrays which are the same dtype and shape at every environment step (even if you just have to fill with zeros when not using it).
If you have an environment that's already registered in gym, then you can use the wrapper as provided in https://github.com/astooke/rlpyt/blob/master/rlpyt/envs/gym.py and shown in example_2.py, where you use the gym_make
factory function as the EnvCls
argument:
https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/examples/example_2.py#L23
Hopefully that helps?
@frankie4fingers that's an unexpected problem! Could you provide more details? The environment should be instantiated separately within each child process in the parallel samplers.
a simple, yet complete, cartpole code example is welcome!
"A sample code is worth a thousand responses"
@astooke that's very helpful, thanks for that!
Le mar. 30 juin 2020 à 17:12, astooke [email protected] a écrit :
Hi! Sorry for the long absence...let me try to help sort through these...
@benman1 https://github.com/benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a self.ModelCls to call. The CatDqnAgent doesn't come with one of these, but the AtariCatDqnAgent is an example that has the model specific to the Atari environment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/astooke/rlpyt/issues/164#issuecomment-651895787, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACSJO7EPYWYXFB3CFFM5URDRZIFG5ANCNFSM4NLCUXWA .
Hello
Ive been trying to modify example2.py to work with Facebook Nethack. I am able to load the environment 'NetHack-v0' via the gym wrapper (gym_make in the SerialSampler) but it seems that the structures returned from NetHack are not in the correct form? Is the correct approach here to go into the NetHack code and/or gym wrapper code and adjust the how the data is returned to the wrapper? Please see attached screenshot, you can see that much of the structure is missing.
thanks for any ideas.
@benman1 The problem in your case is that when the agent tries to initialize the model (neural net), it doesn't have a
self.ModelCls
to call. TheCatDqnAgent
doesn't come with one of these, but theAtariCatDqnAgent
is an example that has the model specific to the Atari environment.
Does this mean there is currently no way to use the CatDqnAgent
with a non-atari environment?
If so, what additional files do we need to get a C51 agent to work with a custom (non-atari) gym environment? (presumably we have to write our own ModelCls
class, anything else?)
@astooke sorry for delay I used code from example3.py
and sampler
is GpuSampler
. It works as expected in serial mode and fails in parallel. As I remember main process do first step, and make info and save other related stuff in main globals, and then all child parallel instances, which not have such content tries to get it and stops on build_info_tuples(info)
in GymEnvWrapper
constructor for all child process (since ntc = globals().get(name)
exists only on main process). So my workaround for now is to disable # build_info_tuples(info)
and provide it separatelly for each parallel instance Sampler(EnvCls=gym_make, env_kwargs=dict(id=env_id, info_example=dict(timeout=2000))
. It works fine for me.
@im-ant Yes the main thing to get C51 working with a custom environment is just to write your own model class. Or maybe your environment has the same observation and action spaces as Atari, in which case you could just use the same model, but maybe you want different default conv hyperparameters or something like that.
@frankie4fingers OK thanks for explaining the problem and the quick workaround. I'm still a bit surprised by this, because I've run gym envs in parallel before. And when the child process looks for ntc=global().get(name)
and it's not there, it should end up with ntc=None
and build it within it's own module globals...hmm OK i'll give example3.py
a run and see..