stable-baselines Pre-Training Problem

When I try to run the code below I get this error at the pretrain function: Error

  File "C:\Users\fabio\Desktop\wetransfer-08d028\Rope_ex_v1.5\RL_Training\behaviour_cloning.py", line 40, in <module>
    model.pretrain(dataset, n_epochs=1000)

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\common\base_class.py", line 346, in pretrain
    expert_obs, expert_actions = dataset.get_next_batch('train')

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 152, in get_next_batch
    dataloader.start_process()

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 231, in start_process
    self.process.start()

  File "C:\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

PicklingError: Can't pickle <function rebuild_pipe_connection at 0x0000024B185C2168>: it's not the same object as multiprocessing.connection.rebuild_pipe_connection

Code

import gym
from stable_baselines.gail import generate_expert_traj
env = gym.make("CartPole-v1")
def dummy_expert(_obs):
    return env.action_space.sample()

generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)

from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset

dataset = ExpertDataset(expert_path='expert_cartpole.npz',
                        traj_limitation=1, batch_size=128)

model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
model.pretrain(dataset, n_epochs=1000)
model.learn(int(1e5))

env = model.get_env()
obs = env.reset()

reward_sum = 0.0
for _ in range(1000):
        action, _ = model.predict(obs)
        obs, reward, done, _ = env.step(action)
        reward_sum += reward
        env.render()
        if done:
                print(reward_sum)
                reward_sum = 0.0
                obs = env.reset()

env.close()

Jul 08 '20 17:07 FabioPINO

Please check the issue template and fill necessary parts, and also paste the full traceback of the exception and place the code into a code-block like ``` this ```

Jul 08 '20 19:07 Miffyli

I am sorry. I hope the description of the issue is more clear now.

Jul 09 '20 13:07 FabioPINO

The full traceback of all processes tells the issue: It is trying to load file expert_cartpole.npz when creating ExpertDataset, but data is stored in dummy_expert_cartpole.npz. Fixing this fixes the issue.

Jul 09 '20 16:07 Miffyli

Sorry, my fault again. It was a typo, the name of the file is correct. Additionally, if I use the following code to generate the expert trajectories I obtain the same error:

Code

from stable_baselines import DQN
from stable_baselines.gail import generate_expert_traj

model = DQN('MlpPolicy', 'CartPole-v1', verbose=1)
      # Train a DQN agent for 1e5 timesteps and generate 10 trajectories
      # data will be saved in a numpy archive named `expert_cartpole.npz`
generate_expert_traj(model, 'expert_cartpole', n_timesteps=int(1e5), n_episodes=10)

Jul 09 '20 17:07 FabioPINO

For me the code runs as expected (Ubuntu 18.04, Python 3.6, stable-baselines 2.10) once I fixed the filenames. You need to study the full traceback printed by the code. The one you pasted is only a side-effect of multiple processes running.

Jul 09 '20 18:07 Miffyli

I am using (windows10, Python 3.7.6, stable-baseline 2.10). Ok I will try to investigate more thoroughly. I managed to fix the problem somehow! If I open a new console in spyder it works fine. I have another issue at the moment, is it normal that the behaviour cloning process takes so long for the code above? It has been running for 15' now.

Jul 10 '20 09:07 FabioPINO

I think I found a bug. If I run the pretrain function with the sequential parameter of Dataloader object set on False, so using multiprocessing to manage data extraction, the program get stuck in these lines of code inside dataset.py:

            try:
                val = self.queue.get_nowait()
                break
            except queue.Empty:
                time.sleep(0.001)
                continue

Instead if I do not use subprocesses to process data, sequential = True, everything works fine.

Additionally, if I am using the multiprocessing mode to process data and I interrupt the loop presented above with ctrl-c, when I try to run again the code I get the error reported in the original question. I found that a workaround for this issue is to open a new console.

Jul 10 '20 13:07 FabioPINO

Did you try putting your code in a if __name__ == "__main__": section (cf doc https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html) This is required to use multiprocessing on windows.

Jul 10 '20 14:07 araffin

I tried but I get the same error as in the main question.

import gym

from stable_baselines.gail import generate_expert_traj
from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset

if __name__ == "__main__":

    env = gym.make("CartPole-v1")
    # Here the expert is a random agent
    # but it can be any python function, e.g. a PID controller
    def dummy_expert(_obs):
        """
        Random agent. It samples actions randomly
        from the action space of the environment.
    
        :param _obs: (np.ndarray) Current observation
        :return: (np.ndarray) action taken by the expert
        """
        return env.action_space.sample()
    # Data will be saved in a numpy archive named `expert_cartpole.npz`
    # when using something different than an RL expert,
    # you must pass the environment object explicitly
    generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)
    
    
    # Using only one expert trajectory
    # you can specify `traj_limitation=-1` for using the whole dataset
    dataset = ExpertDataset(expert_path='expert_cartpole.npz',
                            traj_limitation=1, batch_size=128)
    
    model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
    # Pretrain the PPO2 model
    model.pretrain(dataset, n_epochs=1)
    
    # As an option, you can train the RL agent
    # model.learn(int(1e5))
    
    # Test the pre-trained model
    env = model.get_env()
    obs = env.reset()
    
    reward_sum = 0.0
    for _ in range(1000):
            action, _ = model.predict(obs)
            obs, reward, done, _ = env.step(action)
            reward_sum += reward
            env.render()
            if done:
                    print(reward_sum)
                    reward_sum = 0.0
                    obs = env.reset()
    
    env.close()

Jul 10 '20 14:07 FabioPINO