rlpyt delayed update sampler_model get better result?

delayed update sampler_model get better result?

Open sharr6 opened this issue 4 years ago • 2 comments

Hi, In async_rl, with small change i get better reuslt (3 times converge speed),

opt_info = self.algo.optimize_agent(itr, sampler_itr=self.ctrl.sampler_itr.value)
if itr % 5 == 0:  # added this line
    self.agent.send_shared_memory()  # To sampler.

Not understand how it works...

May 19 '20 12:05 sharr6

ok this is pretty interesting! could you please explain. what algorithm? 3x convergence speed in wall clock time or environment samples? what environment? what agent network? what kind of computer? could you post learning curves? ...

May 21 '20 02:05 astooke

sure @astooke, r2d1 algo, wall_clock_time, atari-up_n_down env, r2d1 agent, 8c_16ht_2gpu, config(T30, B24) don't know how to get the learning curves, i'd post some numbers below if it's ok.

original code result:
Samp_itr CumTime ScoreAvg ScoreMax
99991    16332   28011    119800
199982   32703   86090    367880
239976   39250   114387   391640

edited code result:
Samp_itr CumTime ScoreAvg ScoreMax
99992    15656   46793    136070
199981   30859   225480   394330
239976   36925   241154   411870

best score in this game should be like avg_350000 max_411870, edited code take 57124secs, original code would take much more time i didn't finish.

May 21 '20 02:05 sharr6

rlpyt rlpyt copied to clipboard

delayed update sampler_model get better result?

rlpyt
rlpyt copied to clipboard