Pre-Training Problem
When I try to run the code below I get this error at the pretrain function: Error
File "C:\Users\fabio\Desktop\wetransfer-08d028\Rope_ex_v1.5\RL_Training\behaviour_cloning.py", line 40, in <module>
model.pretrain(dataset, n_epochs=1000)
File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\common\base_class.py", line 346, in pretrain
expert_obs, expert_actions = dataset.get_next_batch('train')
File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 152, in get_next_batch
dataloader.start_process()
File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 231, in start_process
self.process.start()
File "C:\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)
File "C:\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
PicklingError: Can't pickle <function rebuild_pipe_connection at 0x0000024B185C2168>: it's not the same object as multiprocessing.connection.rebuild_pipe_connection
Code
import gym
from stable_baselines.gail import generate_expert_traj
env = gym.make("CartPole-v1")
def dummy_expert(_obs):
return env.action_space.sample()
generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)
from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset
dataset = ExpertDataset(expert_path='expert_cartpole.npz',
traj_limitation=1, batch_size=128)
model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
model.pretrain(dataset, n_epochs=1000)
model.learn(int(1e5))
env = model.get_env()
obs = env.reset()
reward_sum = 0.0
for _ in range(1000):
action, _ = model.predict(obs)
obs, reward, done, _ = env.step(action)
reward_sum += reward
env.render()
if done:
print(reward_sum)
reward_sum = 0.0
obs = env.reset()
env.close()
Please check the issue template and fill necessary parts, and also paste the full traceback of the exception and place the code into a code-block like ``` this ```
I am sorry. I hope the description of the issue is more clear now.
The full traceback of all processes tells the issue: It is trying to load file expert_cartpole.npz when creating ExpertDataset, but data is stored in dummy_expert_cartpole.npz. Fixing this fixes the issue.
Sorry, my fault again. It was a typo, the name of the file is correct. Additionally, if I use the following code to generate the expert trajectories I obtain the same error:
Code
from stable_baselines import DQN
from stable_baselines.gail import generate_expert_traj
model = DQN('MlpPolicy', 'CartPole-v1', verbose=1)
# Train a DQN agent for 1e5 timesteps and generate 10 trajectories
# data will be saved in a numpy archive named `expert_cartpole.npz`
generate_expert_traj(model, 'expert_cartpole', n_timesteps=int(1e5), n_episodes=10)
For me the code runs as expected (Ubuntu 18.04, Python 3.6, stable-baselines 2.10) once I fixed the filenames. You need to study the full traceback printed by the code. The one you pasted is only a side-effect of multiple processes running.
I am using (windows10, Python 3.7.6, stable-baseline 2.10). Ok I will try to investigate more thoroughly. I managed to fix the problem somehow! If I open a new console in spyder it works fine. I have another issue at the moment, is it normal that the behaviour cloning process takes so long for the code above? It has been running for 15' now.
I think I found a bug. If I run the pretrain function with the sequential parameter of Dataloader object set on False, so using multiprocessing to manage data extraction, the program get stuck in these lines of code inside dataset.py:
try:
val = self.queue.get_nowait()
break
except queue.Empty:
time.sleep(0.001)
continue
Instead if I do not use subprocesses to process data, sequential = True, everything works fine.
Additionally, if I am using the multiprocessing mode to process data and I interrupt the loop presented above with ctrl-c, when I try to run again the code I get the error reported in the original question. I found that a workaround for this issue is to open a new console.
Did you try putting your code in a if __name__ == "__main__": section (cf doc https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html)
This is required to use multiprocessing on windows.
I tried but I get the same error as in the main question.
import gym
from stable_baselines.gail import generate_expert_traj
from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset
if __name__ == "__main__":
env = gym.make("CartPole-v1")
# Here the expert is a random agent
# but it can be any python function, e.g. a PID controller
def dummy_expert(_obs):
"""
Random agent. It samples actions randomly
from the action space of the environment.
:param _obs: (np.ndarray) Current observation
:return: (np.ndarray) action taken by the expert
"""
return env.action_space.sample()
# Data will be saved in a numpy archive named `expert_cartpole.npz`
# when using something different than an RL expert,
# you must pass the environment object explicitly
generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)
# Using only one expert trajectory
# you can specify `traj_limitation=-1` for using the whole dataset
dataset = ExpertDataset(expert_path='expert_cartpole.npz',
traj_limitation=1, batch_size=128)
model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
# Pretrain the PPO2 model
model.pretrain(dataset, n_epochs=1)
# As an option, you can train the RL agent
# model.learn(int(1e5))
# Test the pre-trained model
env = model.get_env()
obs = env.reset()
reward_sum = 0.0
for _ in range(1000):
action, _ = model.predict(obs)
obs, reward, done, _ = env.step(action)
reward_sum += reward
env.render()
if done:
print(reward_sum)
reward_sum = 0.0
obs = env.reset()
env.close()