Hongming Zhang issues

Results 8 issues of


                                            Hongming Zhang

关于探索度的问题

alphazero的原文里写的前30步走子设置tau=1，即按照概率随机选取动作。之后设置tau趋于0，再采用概率加上狄利克雷噪声的方式选取动作。这里的实现好像是tau=1，再加上狄利克雷噪声。这两种方法有理论上或者直觉上的差异吗？

the concrete meaning of obs

Hi, I'm running experiments on env RedBlueDoors with FullyObsWrapper and ImgObsWrapper. There are three 2D arrays to present the features. I think the first array presents objects ID, the second...

BreakoutNoFrameskip-v4 does not converge

The final mean reward is only around 40, and it oscillates a lot.

训练了多少局呢？

训练时长？

请问你训练了多久，电脑配置以及总共对弈了大概多少局？

训练中有哪些主要注意的地方吗，15000局就能达到如此效果很不简单

RT 之前自己也在尝试大棋盘，训练了不止15000，但是并没有达到如此效果。作者在训练过程中有什么技巧吗？

data for the learning curve

Hi, Thanks for the interesting work. Could you please also provide the data used to generate the learning curve presented in the paper?

Join group

Hi, Thanks for this great work! I want to join groups but found slack link is no longer active and QQ groups are full. Could you please open another one?...