Hongming Zhang

Results 8 issues of Hongming Zhang

alphazero的原文里写的前30步走子设置tau=1,即按照概率随机选取动作。之后设置tau趋于0,再采用概率加上狄利克雷噪声的方式选取动作。 这里的实现好像是tau=1,再加上狄利克雷噪声。 这两种方法有理论上或者直觉上的差异吗?

Hi, I'm running experiments on env RedBlueDoors with FullyObsWrapper and ImgObsWrapper. There are three 2D arrays to present the features. I think the first array presents objects ID, the second...

The final mean reward is only around 40, and it oscillates a lot.

请问你训练了多久,电脑配置以及总共对弈了大概多少局?

RT 之前自己也在尝试大棋盘,训练了不止15000,但是并没有达到如此效果。作者在训练过程中有什么技巧吗?

Hi, Thanks for the interesting work. Could you please also provide the data used to generate the learning curve presented in the paper?

Hi, Thanks for this great work! I want to join groups but found slack link is no longer active and QQ groups are full. Could you please open another one?...