PantheonRL
PantheonRL copied to clipboard
Overcooked and OffPolicyAgent
Hi,
I adapted the simple example to use
import gym
from overcookedgym.overcooked_utils import LAYOUT_LIST
from pantheonrl.common.agents import OnPolicyAgent, OffPolicyAgent
from stable_baselines3 import PPO, DQN
layout = "simple"
assert layout in LAYOUT_LIST
print(f"Using layout: {layout} from {LAYOUT_LIST}")
env = gym.make("OvercookedMultiEnv-v0", layout_name=layout)
partner = OffPolicyAgent(DQN("MlpPolicy", env, verbose=1))
env.add_partner_agent(partner)
ego = DQN("MlpPolicy", env, verbose=1)
ego.learn(total_timesteps=1000)
Just to test OffPolicyAgent
but I keep getting:
Traceback (most recent call last):
File "/projects/ruhdorfer/msc2023_constantin/src/scripts/train_simple_overcooked.py", line 31, in <module>
ego.learn(total_timesteps=1000)
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/dqn/dqn.py", line 269, in learn
return super().learn(
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 311, in learn
rollout = self.collect_rollouts(
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 543, in collect_rollouts
new_obs, rewards, dones, infos = env.step(actions)
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 163, in step
return self.step_wait()
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/vec_env/dummy_vec_env.py", line 54, in step_wait
obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/monitor.py", line 95, in step
observation, reward, done, info = self.env.step(action)
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
observation, reward, done, info = self.env.step(action)
File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/multiagentenv.py", line 195, in step
acts = self._get_actions(self._players, self._obs, action)
File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/multiagentenv.py", line 157, in _get_actions
actions.append(agent.get_action(ob))
File "/projects/ruhdorfer/PantheonRL/pantheonrl/common/agents.py", line 263, in get_action
self.model._store_transition(
File "/projects/ruhdorfer/msc2023_constantin/venv/lib/python3.10/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 455, in _store_transition
for i, done in enumerate(dones):
TypeError: 'bool' object is not iterable
This seems to be due to the fact that SB3 is expecting multiple dones
from env.step
in stable_baselines3/common/off_policy_algorithm.py:544
: new_obs, rewards, dones, infos = env.step(actions)
where Overcooked only returns a single done
in overcookedgym/overcooked.py:80
.
Are off policy algorithms not supported? Is there a good way of fixing this, i.e. by changing line 80 from
return (ego_obs, alt_obs), (reward, reward), done, {}#info
to
return (ego_obs, alt_obs), (reward, reward), [done], {}#info
?
Thank you!
Cheers, Constantin
Hi, I can confirm that simply changing line 80 in multi_step
in overcookedgym/overcooked.py
from:
return (ego_obs, alt_obs), (reward, reward), done, {}#info
to this
return (ego_obs, alt_obs), (reward, reward), [done], {}#info
fixes the issue and still works with OnPolicyAgent
and PPO
. I will open up a PR, can you maybe comment if this has any other implications? Thanks
PR is here #14