Reinforcement-learning-with-tensorflow icon indicating copy to clipboard operation
Reinforcement-learning-with-tensorflow copied to clipboard

a3c的疑问

Open icesit opened this issue 6 years ago • 3 comments

莫凡您好,我最近用您的a3c,看代码中有些疑惑向您请教:

  1. A3C_RNN.PY的150行中,buffer_r.append((r+8)/8),这里为何要把奖励这样变呢?
  2. 186行中,GLOBAL_RUNNING_R.append(0.9 * GLOBAL_RUNNING_R[-1] + 0.1 * ep_r),用于显示的总奖励为何要这样算呢?

icesit avatar Mar 15 '19 12:03 icesit

然后我又在您的ddpg_update2里看到对r除以10,这些对原始奖励进行的操作对训练有什么影响吗?

icesit avatar Mar 22 '19 03:03 icesit

我只是fork他的,我不是莫烦 ---- 原始邮件 ---- From:"XueWuyang"[email protected]; Date:2019年3月22日(星期五) 中午11:48 To:"MorvanZhou/Reinforcement-learning-with-tensorflow"[email protected]; Cc:"Subscribed"[email protected]; Subject:Re: [MorvanZhou/Reinforcement-learning-with-tensorflow] a3c的疑问 (#121)

然后我又在您的ddpg_update2里看到对r除以10,这些对原始奖励进行的操作对训练有什么影响吗?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

qin-you avatar Mar 22 '19 08:03 qin-you

莫凡您好,我最近用您的a3c,看代码中有些疑惑向您请教:

  1. A3C_RNN.PY的150行中,buffer_r.append((r+8)/8),这里为何要把奖励这样变呢?
  2. 186行中,GLOBAL_RUNNING_R.append(0.9 * GLOBAL_RUNNING_R[-1] + 0.1 * ep_r),用于显示的总奖励为何要这样算呢?

我的理解是这样在看曲线时候更平滑, 要不然上上下下锯齿一样的曲线,看的太难受。

hyc6668378 avatar Sep 22 '19 12:09 hyc6668378