chainerrl Some agents with online updates fail when used with step_offset of train

Some agents with online updates fail when used with step_offset of train_agent

Open toslunar opened this issue 7 years ago • 1 comments

The PCL implementation causes a minor error when it is used with step_offset, because of

if self.t - self.t_start == self.t_max:

in pcl.py, while self.t is overwritten by train_agent. To be precise, assert self.t_max is None or self.t - self.t_start <= self.t_max will fail at the beginning of PCL.update_on_policy (https://github.com/toslunar/chainerrl/commit/f8d07b385d11cd63aea03558cfc4eb1db632d370).

The implementations of A3C and ACER seem to have the same issue if trained by train_agent instead of train_agent_async.

Aug 14 '17 00:08 toslunar

Good catch. The problem comes from the fact that resuming agent training via step_offset is not well tested.

Aug 17 '17 00:08 muupan

chainerrl chainerrl copied to clipboard

Some agents with online updates fail when used with step_offset of train_agent

chainerrl
chainerrl copied to clipboard