Reinforcement-learning-with-tensorflow a3c的疑问

莫凡您好，我最近用您的a3c，看代码中有些疑惑向您请教：

A3C_RNN.PY的150行中，buffer_r.append((r+8)/8)，这里为何要把奖励这样变呢？
186行中，GLOBAL_RUNNING_R.append(0.9 * GLOBAL_RUNNING_R[-1] + 0.1 * ep_r)，用于显示的总奖励为何要这样算呢？

Mar 15 '19 12:03 icesit

然后我又在您的ddpg_update2里看到对r除以10，这些对原始奖励进行的操作对训练有什么影响吗？

Mar 22 '19 03:03 icesit

我只是fork他的，我不是莫烦 ---- 原始邮件 ---- From:"XueWuyang"[email protected]; Date:2019年3月22日(星期五) 中午11:48 To:"MorvanZhou/Reinforcement-learning-with-tensorflow"[email protected]; Cc:"Subscribed"[email protected]; Subject:Re: [MorvanZhou/Reinforcement-learning-with-tensorflow] a3c的疑问 (#121)

然后我又在您的ddpg_update2里看到对r除以10，这些对原始奖励进行的操作对训练有什么影响吗？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Mar 22 '19 08:03 qin-you

莫凡您好，我最近用您的a3c，看代码中有些疑惑向您请教：

A3C_RNN.PY的150行中，buffer_r.append((r+8)/8)，这里为何要把奖励这样变呢？

186行中，GLOBAL_RUNNING_R.append(0.9 * GLOBAL_RUNNING_R[-1] + 0.1 * ep_r)，用于显示的总奖励为何要这样算呢？

我的理解是这样在看曲线时候更平滑，要不然上上下下锯齿一样的曲线，看的太难受。

Sep 22 '19 12:09 hyc6668378

Reinforcement-learning-with-tensorflow Reinforcement-learning-with-tensorflow copied to clipboard

a3c的疑问

Reinforcement-learning-with-tensorflow
Reinforcement-learning-with-tensorflow copied to clipboard