ACKTR hangs on atari and works very slow on custom env
Describe the bug
Basically I wanted to check how ACKTR would perform on my custom env, however it performs first 10 updates quite fast, and then each iteration is taking very long on my env (with async_eigen_decomp=True it takes even longer) and hangs on atari. During first 10 updates it uses all cores of my cpu and significant part of my gpu, while after that it uses only one core in 100% and nothing else. What's interesting is that same thing happens when i use env created with make_atari, while for env created with make_atari_env it seems to perform better (still slow and only one core after 10th update, but doesn't completely hang as make_atari env).
Code example
from datetime import datetime
from stable_baselines.common.atari_wrappers import make_atari
from stable_baselines.common.callbacks import BaseCallback
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.policies import CnnPolicy
from stable_baselines import ACKTR
class OnUpdate(BaseCallback):
def __init__(self):
super().__init__()
self.update_num = 0
self.last_update_timestamp = datetime.now()
def _on_rollout_end(self) -> None:
self.update_num += 1
diff = datetime.now() - self.last_update_timestamp
diff = f', {int(diff.total_seconds() * 1000)}ms since prev update'
print(f'starting update {self.update_num}{ diff if self.update_num > 1 else "" }')
self.last_update_timestamp = datetime.now()
callback = OnUpdate()
# this one perform slow
env = make_atari_env('BreakoutNoFrameskip-v4', num_env=1, seed=0)
# but this one hangs completely
# env = make_atari('BreakoutNoFrameskip-v4')
model = ACKTR(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=50000, callback=callback)
For make_atari_env example output from callback looks like this:
starting update 1
starting update 2, 758ms since prev update
starting update 3, 74ms since prev update
starting update 4, 73ms since prev update
starting update 5, 77ms since prev update
starting update 6, 72ms since prev update
starting update 7, 78ms since prev update
starting update 8, 86ms since prev update
starting update 9, 71ms since prev update
starting update 10, 76ms since prev update
starting update 11, 15836ms since prev update
starting update 12, 18616ms since prev update
starting update 13, 17480ms since prev update
starting update 14, 18779ms since prev update
starting update 15, 17008ms since prev update
For make_atari:
starting update 1
starting update 2, 2100ms since prev update
starting update 3, 731ms since prev update
starting update 4, 744ms since prev update
starting update 5, 737ms since prev update
starting update 6, 739ms since prev update
starting update 7, 756ms since prev update
starting update 8, 740ms since prev update
starting update 9, 735ms since prev update
starting update 10, 742ms since prev update
... no next output for at least 10 minutes
For my custom env (4 workers wrapped in SubprocVecEnv, observation shape is Box(0, 255, (90, 120, 5), uint8)):
starting update 1
starting update 2, 1140ms since prev update
starting update 3, 440ms since prev update
starting update 4, 456ms since prev update
starting update 5, 412ms since prev update
starting update 6, 437ms since prev update
starting update 7, 462ms since prev update
starting update 8, 457ms since prev update
starting update 9, 437ms since prev update
starting update 10, 433ms since prev update
starting update 11, 83691ms since prev update
starting update 11, 83691ms since prev update
starting update 12, 59344ms since prev update
starting update 13, 72802ms since prev update
starting update 14, 61118ms since prev update
starting update 15, 67188ms since prev update
System Info Describe the characteristic of your environment:
- stable_baselines 2.10.1 installed from pip
- GPU: GTX 1070Ti, RAM: 24GB, CPU: i7 3770
- Python 3.7 (miniconda)
- Tensorflow gpu 1.15 installed from conda (also tried with 1.14 and cpu version)
Additional context Add any other context about the problem here.
Hello, Probably a duplicate of https://github.com/hill-a/stable-baselines/issues/196 Which OS are you using?
I would recommend you to use PPO2 (or even Stable-Baselines3 PPO) as it also supports multiprocessing and usually give comparable results to ACKTR.
Hi, thanks for response @araffin I'm making kind of comparison how RL algorithms perform on problem simulated by my custom env, that's why I wanted to test ACKTR. I've seen #196, but in my case memory is not an issue I think My OS is openSUSE linux 15.2