rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

Buffer cannot be saved while using 1-D batch_T

Open LecJackS opened this issue 4 years ago • 3 comments

On example_1, if instead of seeing a minibatch of 4 images per time step for pong, we see only one, the first time the buffer is updated, the program crashes:

Traceback (most recent call last):
  File "example_1_bug.py", line 65, in <module>
    cuda_idx=args.cuda_idx,
  File "example_1_bug.py", line 52, in build_and_train
    runner.train()
  File "/home/jack/rlpyt/rlpyt/runners/minibatch_rl.py", line 314, in train
    opt_info = self.algo.optimize_agent(itr, samples)
  File "/home/jack/rlpyt/rlpyt/algos/dqn/dqn.py", line 169, in optimize_agent
    self.replay_buffer.append_samples(samples_to_buffer)
  File "/home/jack/rlpyt/rlpyt/replays/frame.py", line 61, in append_samples
    self.samples_frames[:fm1] = self.samples_frames[-fm1:]
ValueError: could not broadcast input array from shape (1000,1,104,80) into shape (0,1,104,80)

Solved it by changing line 49 on frame.py, but I'm not sure it continues with its expected behaviour:

def append_samples(self, samples):
        """Appends all samples except for the `observation` as normal.
        Only the new frame in each observation is recorded."""
        t, fm1 = self.t, self.n_frames - 1 + 1  # < so its not zero

To reproduce, modify example_1 with:

SerialSampler(...
    env_kwargs=dict(game=game, num_img_obs=1),         # < only 1 
    eval_env_kwargs=dict(game=game, num_img_obs=1),    # < only 1 
    batch_T=1,  # < only 1 
    batch_B=1,
...)

And an smaller buffer to crash it faster:

algo = DQN(min_steps_learn=1e3,
                replay_size=int(1e3))    # < smaller

Found this working with a custom environment with 1-D observations, so maybe is not expected to be used like that.

LecJackS avatar May 23 '20 03:05 LecJackS

I should read better the description on FrameMixin class:

https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/frame.py#L12

So I think I should use BaseNStepReturnBuffer from n_step.py.

Will check where to set that, sorry about the noise

LecJackS avatar May 23 '20 03:05 LecJackS

Find the way to do it, but maybe is not entirely implemented yet? Or maybe I'm doing it wrong.

from rlpyt.replays.frame import FrameBufferMixin
from rlpyt.replays.n_step import BaseNStepReturnBuffer

algo = DQN(...,
      ReplayBufferCls=BaseNStepReturnBuffer)

Which returns:

~/rlpyt/rlpyt/replays/base.py in sample_batch(self, batch_B, batch_T)
     11     def sample_batch(self, batch_B, batch_T=None):
     12         """Returns a data batch, e.g. for training."""
---> 13         raise NotImplementedError

NotImplementedError: 

As in here is not implemented: https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/base.py#L13

LecJackS avatar May 23 '20 04:05 LecJackS

Good catch! I also ran into this problem recently, will patch in the fix shortly, the correct update is in frame.py but at a later line:

https://github.com/astooke/rlpyt/blob/85d4e018a919118c6e42fac3e897aa346d84b9c5/rlpyt/replays/frame.py#L57

should be changed to: elif self.t < t and fm1 > 0:

The problem is when you get to the end of the buffer, it copies some duplicate frames back to the beginning of the buffer, but when you only have 1 frame per step, there are no duplicates to copy. :)

astooke avatar Jun 30 '20 16:06 astooke