rlpyt
rlpyt copied to clipboard
delayed update sampler_model get better result?
Hi, In async_rl, with small change i get better reuslt (3 times converge speed),
opt_info = self.algo.optimize_agent(itr, sampler_itr=self.ctrl.sampler_itr.value)
if itr % 5 == 0: # added this line
self.agent.send_shared_memory() # To sampler.
Not understand how it works...
ok this is pretty interesting! could you please explain. what algorithm? 3x convergence speed in wall clock time or environment samples? what environment? what agent network? what kind of computer? could you post learning curves? ...
sure @astooke, r2d1 algo, wall_clock_time, atari-up_n_down env, r2d1 agent, 8c_16ht_2gpu, config(T30, B24) don't know how to get the learning curves, i'd post some numbers below if it's ok.
original code result:
Samp_itr CumTime ScoreAvg ScoreMax
99991 16332 28011 119800
199982 32703 86090 367880
239976 39250 114387 391640
edited code result:
Samp_itr CumTime ScoreAvg ScoreMax
99992 15656 46793 136070
199981 30859 225480 394330
239976 36925 241154 411870
best score in this game should be like avg_350000 max_411870, edited code take 57124secs, original code would take much more time i didn't finish.