KataGo How to design LCB algorithm?

How to design LCB algorithm?

Open CGLemon opened this issue 2 years ago • 5 comments

In Leela Zero, the team implemented LCB by computing quantile of student's T pdf and Welford's online algorithm. It is easy to understand. But LCB algorithm is complex in KataGo. I am confused about some issues.

First, What is the 'normToTApprox' in the fancymath file? The comments says that it is approximation function for student's T. But it seemd that it is large different with Leela Zero. The computing quantile of student's algorithm in Leela Zero only give it degrees of freedom. 'normToTApprox' need to give it addition parameter, approximation value. How does the function work?

Second, The 'getSelfUtilityLCBAndRadius' and updating algorithm are very complex. But I didn't any comment for the algorithm. May you talk about the main ideal for the algorithm and explaining why the algorithm is working?

Thanks.

Dec 02 '21 05:12 CGLemon

You can check here, in the .h file there is a comment in mathematically precise language: https://github.com/lightvector/KataGo/blob/master/cpp/core/fancymath.h#L21
Instead of Welford, KataGo just computes E(utility^2) and E(utility) and then says Var(utility) = E(utility^2) - E(utility)^2 Welford gives you more numerical precision but there is no need for it because doubles have more than enough precision for our purposes - tracking the 1st and 2nd moment is much easier given that we have weighted data.

The reason it might look complex is:

KataGo has to handle weighted data. In KataGo, different visits have different weights instead of all visits having weight 1.
It makes an adjustment to the utility values at the root node based on downweighting some endgame moves slightly to encourage good and human-friendly endgame behavior.

For mathematical understanding, you can eliminate these. If all visits have equal weight of 1 and we have N visits, then weightSum (sum of the weights of visits) will simply be N, and weightSqSum (sum of the squares of weights of visits) will also be N.

And assume that endingScoreBonus = 0, therefore utilityDiff = 0.

So try plugging in N for weightSum and weightSqSum , and 0 for endingScoreBonus and utilityDiff and you can observe what the code collapses down to, it should now be pretty simple.

Last detail is that KataGo doesn't specify a quantile. It specifies a desired number of standard deviations of a normal distribution. For example, in a normal distribution, about 2.5% of it is beyond +2 standard deviations. So instead of specifying a quantile like 0.025, KataGo would specify "2" for lcbStdevs, and then as you see in the code it gets multiplied by a correction factor because if the degrees of freedom is low, then actually you need more than 2 stdevs to achieve that significance level.

Dec 02 '21 08:12 lightvector

Hope that helps! Reply back if you have any questions after seeing how the code simplifies with the suggested substitutions of N and 0, which should make the structure much more apparent.

Dec 02 '21 08:12 lightvector

Thanks!

So Kata Go use the Naïve algorithm instead of Welford?
I still can not understand how to compute new weightSum. Seem that Kata Go update weightSum by addition new computeWeightFromNNOutput(nnOutput). What algorithm does the Kata Go use in it?

Jul 30 '22 13:07 CGLemon

Yes. The naive algorithm is simple and the amount of numerical precision in practice is more than enough.

computeWeightFromNNOutput is a completely arbitrary heuristic based on tuning some parameters and seeing what produced the best play. You should treat it as a black box that says how reliable that neural net output is by declaring how much weight it should have.

Jul 30 '22 13:07 lightvector

We can write the LCB algorithm as following

Q = getAverageWinrate()
Var = getVariance()
Std = sqrt(Var) / visits
Z = getNormToTApproxForLCB(visits)
LCB = Q - Z * Std

It is a little different with Kata Go. But I guess they are same ideals. My question is that the stand deviation is not mathematics stand deviation. The mathematics stand deviation should be?

Std = sqrt(Var)

Why do we must divide Std again?

Aug 12 '22 17:08 CGLemon

KataGo KataGo copied to clipboard

How to design LCB algorithm?

KataGo
KataGo copied to clipboard