rlpyt
rlpyt copied to clipboard
Buffer cannot be saved while using 1-D batch_T
On example_1, if instead of seeing a minibatch of 4 images per time step for pong, we see only one, the first time the buffer is updated, the program crashes:
Traceback (most recent call last):
File "example_1_bug.py", line 65, in <module>
cuda_idx=args.cuda_idx,
File "example_1_bug.py", line 52, in build_and_train
runner.train()
File "/home/jack/rlpyt/rlpyt/runners/minibatch_rl.py", line 314, in train
opt_info = self.algo.optimize_agent(itr, samples)
File "/home/jack/rlpyt/rlpyt/algos/dqn/dqn.py", line 169, in optimize_agent
self.replay_buffer.append_samples(samples_to_buffer)
File "/home/jack/rlpyt/rlpyt/replays/frame.py", line 61, in append_samples
self.samples_frames[:fm1] = self.samples_frames[-fm1:]
ValueError: could not broadcast input array from shape (1000,1,104,80) into shape (0,1,104,80)
Solved it by changing line 49 on frame.py, but I'm not sure it continues with its expected behaviour:
def append_samples(self, samples):
"""Appends all samples except for the `observation` as normal.
Only the new frame in each observation is recorded."""
t, fm1 = self.t, self.n_frames - 1 + 1 # < so its not zero
To reproduce, modify example_1 with:
SerialSampler(...
env_kwargs=dict(game=game, num_img_obs=1), # < only 1
eval_env_kwargs=dict(game=game, num_img_obs=1), # < only 1
batch_T=1, # < only 1
batch_B=1,
...)
And an smaller buffer to crash it faster:
algo = DQN(min_steps_learn=1e3,
replay_size=int(1e3)) # < smaller
Found this working with a custom environment with 1-D observations, so maybe is not expected to be used like that.
I should read better the description on FrameMixin class:
https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/frame.py#L12
So I think I should use BaseNStepReturnBuffer
from n_step.py.
Will check where to set that, sorry about the noise
Find the way to do it, but maybe is not entirely implemented yet? Or maybe I'm doing it wrong.
from rlpyt.replays.frame import FrameBufferMixin
from rlpyt.replays.n_step import BaseNStepReturnBuffer
algo = DQN(...,
ReplayBufferCls=BaseNStepReturnBuffer)
Which returns:
~/rlpyt/rlpyt/replays/base.py in sample_batch(self, batch_B, batch_T)
11 def sample_batch(self, batch_B, batch_T=None):
12 """Returns a data batch, e.g. for training."""
---> 13 raise NotImplementedError
NotImplementedError:
As in here is not implemented: https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/base.py#L13
Good catch! I also ran into this problem recently, will patch in the fix shortly, the correct update is in frame.py but at a later line:
https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/frame.py#L57
should be changed to:
elif self.t < t and fm1 > 0:
The problem is when you get to the end of the buffer, it copies some duplicate frames back to the beginning of the buffer, but when you only have 1 frame per step, there are no duplicates to copy. :)