acme
acme copied to clipboard
run_dqn demo fails with distributed training: ValueError: TrajectoryColumns cannot contain any None data references
Ubuntu 20.04 cuda-11.4 4 GPU / 4 CPU core node
Setup (from fresh VM):
> apt-get update && apt-get install -y --no-install-recommends \
libgl1-mesa-glx libosmesa6 libglew-dev
> pip install --upgrade pip setuptools wheel
> git clone https://github.com/deepmind/acme.git acme_repo
> cd acme_repo
> pip install .[jax,tf,testing,envs]
> pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
> pip install ale-py
Your favorite method to install atari rooms into ale-py here
Then:
> cd examples/baselines/rl_discrete
> python run_dqn.py --run_distributed
Produces this error:
I0525 17:21:19.286271 139675943556864 terminal.py:91] [Actor] Actor Episodes = 1 | Actor Steps = 186 | Episode Length = 186 | Episode Return = -5.0 | Steps Per Second = 1.972
Node ThreadWorker(thread=<Thread(actor, stopped daemon 139661187999488)>, future=<Future at 0x7f181df5fa60 state=finished raised ValueError>) crashed:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/launchpad/launch/worker_manager.py", line 474, in _check_workers
worker.future.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/local/lib/python3.8/dist-packages/launchpad/launch/worker_manager.py", line 250, in run_inner
future.set_result(f())
File "/usr/local/lib/python3.8/dist-packages/launchpad/nodes/python/node.py", line 75, in _construct_function
return functools.partial(self._function, *args, **kwargs)()
File "/usr/local/lib/python3.8/dist-packages/launchpad/nodes/courier/node.py", line 130, in run
instance.run()
File "/usr/local/lib/python3.8/dist-packages/acme/environment_loop.py", line 176, in run
result = self.run_episode()
File "/usr/local/lib/python3.8/dist-packages/acme/environment_loop.py", line 109, in run_episode
self._actor.observe(action, next_timestep=timestep)
File "/usr/local/lib/python3.8/dist-packages/acme/agents/jax/actors.py", line 94, in observe
self._adder.add(
File "/usr/local/lib/python3.8/dist-packages/acme/adders/reverb/transition.py", line 133, in add
super().add(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/acme/adders/reverb/base.py", line 207, in add
self._write()
File "/usr/local/lib/python3.8/dist-packages/acme/adders/reverb/transition.py", line 169, in _write
reward, discount = tree.map_structure(
I0525 17:21:20.443459 139678827484928 lp_utils.py:95] StepsLimiter: Reached 186 recorded steps
File "/usr/local/lib/python3.8/dist-packages/tree/__init__.py", line 430, in map_structure
[func(*args) for args in zip(*map(flatten, structures))])
File "/usr/local/lib/python3.8/dist-packages/tree/__init__.py", line 430, in <listcomp>
[func(*args) for args in zip(*map(flatten, structures))])
File "/usr/local/lib/python3.8/dist-packages/acme/adders/reverb/transition.py", line 151, in <lambda>
get_all_np = lambda x: x[self._first_idx:self._last_idx].numpy()
File "/usr/local/lib/python3.8/dist-packages/reverb/trajectory_writer.py", line 604, in __getitem__
return TrajectoryColumn(self._slice(val), path=path)
File "/usr/local/lib/python3.8/dist-packages/reverb/trajectory_writer.py", line 629, in __init__
raise ValueError('TrajectoryColumns cannot contain any None data '
ValueError: TrajectoryColumns cannot contain any None data references: TrajectoryColumn at path ('reward', slice(187, 188, None)) got [None].
Without the distributed flag, it works fine.
Digging a little, it appears that sometimes the next_timestep variable has None values: https://github.com/deepmind/acme/blob/2871e3216d2ffc2bc0ffea8b6a0e3071897608b9/acme/agents/jax/actors.py#L95
TimeStep(step_type=<StepType.FIRST: 0>, reward=None, discount=None, observation=Somenonzeroarray)
I think I've tracked down where things are going wrong, at least in the environment loop:
https://github.com/deepmind/acme/blob/2871e3216d2ffc2bc0ffea8b6a0e3071897608b9/acme/environment_loop.py#L106
With the -run_distributed option, the environment step call sometimes returns a timestep with None reward and discount factor, as if it had called reset. I don't know enough about the backend to understand how to fix this.
I was playing around with the number of agents here:
https://github.com/deepmind/acme/blob/2871e3216d2ffc2bc0ffea8b6a0e3071897608b9/examples/baselines/rl_discrete/run_dqn.py#L80
reducing the number of agents seems to make the error appear later in training, but it still appears. With 1 actor (using the run_distributed flag) I have not seen the error.
I think I know the issue: the environment factory in that example (in fact all examples) returns the same instance of the same environment, so there's some sort of async problem with agents calling reset on the environment right after other agents called step. I switched the factory to something that generates new environment instances, and things appear to work fine.
Is this the correct solution to this?
@rdevon The environment_factory should create a new environment every time it is called.
Therefore, the examples rl_discrete/ are indeed all incorrect.
After solving the previous issue by setting up the environment to be callable in the run_dqn.py file, the ExperimentConfig becomes non-serializable. So it is not possible to run the program in a distributed manner with launch_type="local_mp"
Is there anything I'm missing or any possible solution to this problem?