Ray
Ray copied to clipboard
Question about Progressive Widening
Yuki Kobayashi:
According to your paper, I try to add the Progressive Widening in the MCTS. But I find the PV string is too long. Here is it. It is just only around 5000 playouts. I think it is weird result (The most visits child is PV node). The PV string length of Ray in the same condition is 2~4 (use lz-analyze with Sabaki).

Here is my simple implementation.
Progressive Widening
The recursive formula is
$t_{n+1} = t_n + 40 \times 1.4 n,\ t_0 = 0$
Assume $k$ is opened children number of current node. The $v$ is visits number of current node. It must be satisfied
$t_{k-1} < v < t_{k}$
UCT Selection
The UCT formula is borrowed from Ray.
$UCT_i = Q_i + C_{param} \sqrt{ \frac{ \Sigma n_a }{ n_i + 1 } } + 0.35 \times rate \times \sqrt{ \frac{1000}{1000 + \Sigma n_a} }$
- $\Sigma n_a$ means the visits of current node.
- $rate$ means the MM priority (already be normalized).
Search ordering
I order the child by MM priority. The best priority will be opened first.
I have no ideal what's wrong. What should I doubt first? Thanks!
Length of PV depends on the number of expanded nodes. If your program expands a node when a leaf node visit is only one time, length of PV will be long.
Ray expands a node every 40 visits on 19x19 (visit count depends on EXPAND_THRESHOLD parameter). When you use value network, EXPAND_THRESHOLD must be 1. On the other hand, when you use only Monte-Carlo simulation, EXPAND_THRESHOLD must be adjusted.
Thank! The PV string looks better now.
Do you have any ideal why we apply the EXPAND_THRESHOLD? Why is this trick work? I want to write the theory in my log. Thanks!
When we used only Monte-Carlo simulation for evaluating positions, evaluations are unstable. Each simulation has a mix of wins and loses. This is because Monte-Carlo simulation uses random number for match simulations.
EXPAND_THRESHOLD is used for waiting for evaluations converge. The number of EXPAND_THRESHOLD depends on style of Monte-Carlo simulation. Pachi and Fuego have much smaller EXPAND_THRESHOLD. Maybe the use of RAVE is one of factors.