imitation Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3

Bug description

Hello,

I want to pass the policy learned from behavioural cloning in imitation library to PPO, I thought it would be successful since they are both from ActorCriticPolicy class, however it doesn't work as I expected.

Steps to reproduce

from stable_baselines3 import PPO
from imitation.algorithms import bc

bc_trainer = bc.BC(
observation_space=env.observation_space,
action_space=env.action_space,
device='cuda',
policy=bc.reconstruct_policy(policy_path, device='cuda'),
rng=rng,
)
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda')

The error is:

Traceback (most recent call last): File "agent/main.py", line 142, in model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda') File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 164, in init self._setup_model() File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 167, in _setup_model super()._setup_model() File "/home/repos/stable-baselines3/stable_baselines3/common/on_policy_algorithm.py", line 120, in _setup_model self.policy = self.policy_class( # pytype:disable=not-instantiable File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() got an unexpected keyword argument 'use_sde'

Environment

Operating system and version: Ubuntu 20.04.6 LTS
Python version: 3.8.0
Pytorch 1.13.0
Imitation 0.4.0
Stable base line 1.8.0
Gym 0.21.0

Sep 10 '23 19:09 JkAcktuator

Hi, I had the same error trying to retrain a policy with PPO after Behaviour Cloning. Actually, the problem is here : model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda') because the policy argument expects a class (or alternatively a string). The error message is unclear since it is raised by PyTorch.

So when instantiating PPO with sb3, you should pass the policy class you want to use (which should inherit from ActorCriticPolicy). For example : model = PPO(policy=ActorCriticPolicy, env=env,verbose=1, device = 'cuda')

This should work for instantiating the PPO. However, I am not sure how you should load the pre-trained policy, I could not find the right way to do it in stable-baselines3 (I tried model.policy = bc_trainer.policy but not sure if it works properly).

Hope it helps somehow. Let me know if you find the right way to load a pre-trained policy with PPO algorithm 👍

Sep 27 '23 10:09 yojul

For those who stumble across this issue, the load_from_vector method seems to work:

pretrained_policy = ActorCriticPolicy.load("/path/")
model = PPO(ActorCriticPolicy, env)
model.policy.load_from_vector(pretrained_policy.parameters_to_vector())
model.learn(total_timesteps=100_000, reset_num_timesteps=False)

Jan 24 '24 20:01 AlexGisi

I had issue with saving and loading BC models and the below worked for me

from stable_baselines3.common import policies
# Saving
bc_model = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
)
bc_model.policy.save('models/test/model.zip')

# Loading
pretrained_policy = policies.ActorCriticPolicy.load('models/test/model.zip')
bc_model_reloaded = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
    policy = pretrained_policy
)

Mar 26 '24 08:03 saeed349

I followed the above method mentioned by @yojul to load a BC model in SB3. But when I retrain the model in SB3 and then save it and try to reload it using PPO.load like below. I am getting an error of shape mismatch when copying weights. I am guessing this is due to the difference between imitation.policies.base.FeedForward32Policy and stable_baselines3 ActorCriticPolicy.

Can @AlexGisi, @yojul or @JkAcktuator share how you overcome this issue ?

pretrained_policy= policies.ActorCriticPolicy.load("imitation_bc_model.zip")
model = PPO(policy=policies.ActorCriticPolicy,env=env)
model.policy = pretrained_policy

model.learn(total_timesteps=100_000)
model.save("sb3_model.zip")

del model

model = PPO.load(("sb3_model.zip") # this throws an error

Mar 27 '24 16:03 saeed349

Hi @saeed349!. Here is my code.


dense_rollouts = rollout.rollout(
    dense_expert,
    DummyVecEnv([lambda: RolloutInfoWrapper(dense_env)]),
    rollout.make_sample_until(min_timesteps=None, min_episodes=250),
    rng=dense_rng,
)
dense_transitions = rollout.flatten_trajectories(dense_rollouts)

dense_bc = CustomBC(

                    observation_space=dense_env.observation_space,
                    action_space=dense_env.action_space,
                    policy = dense_expert.policy,
                    demonstrations=dense_transitions,
                    rng=dense_rng,
                    device = 'cuda',
                    tensorboard_log = f'/home/cai/Desktop/PILRnav/runs/dense_bc'
                    )
dense_bc.train(n_epochs=10)
dense_bc.policy.save("/home/cai/Desktop/PILRnav/weight/dense_bc")

dense_ppo = PPO(policy='MlpPolicy', 
             env=dense_env, 
             policy_kwargs = policy_kwargs,
             use_sde = False,
            batch_size = 64,
            n_epochs = 7,
            learning_rate= 0.0004,
             tensorboard_log= f'/home/cai/Desktop/PILRnav/runs/dense_ppo' , 
             verbose= 1 )

dense_ppo.policy = dense_ppo.policy.load("/home/cai/Desktop/PILRnav/weight/dense_bc")
dense_ppo.learn(MAX_ITER)
dense_ppo.policy.save("/home/cai/Desktop/PILRnav/weight/dense_ppo")

I hope that it will helpful.

Apr 24 '24 11:04 CAI23sbP

imitation imitation copied to clipboard

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3

Bug description

Steps to reproduce

Environment

imitation
imitation copied to clipboard