Brian Lee comments

Results 34 comments of


                                            Brian Lee

same game again and again

Try using 'randompolicy' as a option instead of 'policy'. Policy will take the most likely move, but randompolicy will select randomly, weighted by the NN's output probabilities.

Is it possible for you to comment each line of the code for pedagogical purposes?

What sort of commentary are you looking for? I don't necessarily want to be in the business of writing python tutorials or tensorflow tutorials, but stuff like MCTS or stuff...

Is it possible for you to comment each line of the code for pedagogical purposes?

A lot of it is already pretty well commented at the top of each file, as to how the whole thing is arranged. Is there something in particular that you're...

handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'

Can you give me an example of the sgf file that it's running into issues on? I suspect it's an sgf file that violates the standards, so having the file...

handicap=int(sgf_prop(props.get('HA', [0]))) ValueError: invalid literal for int() with base 10: '吴受先'

Oh.. ugh, this makes me sad. So, the SGF file should declare that its encoding is GB18030; I can't just assume it. Most western-generated SGFs assume UTF-8, so putting in...

Reduce history length requirement

One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make...

Consider active learning approach

This general class of idea is called a 'baseline', and existing examples work by subtracting a baseline (I expected to win with 0.95 probability) from the eventual result (I won...

Consider active learning approach

The word 'baseline' is what you'll want to search to actually find previous literature on the topic.

Consider active learning approach

I see what you're saying. I think the baseline approach effectively does what you want. The gradients per example are linearly correlated to the final error, so if you train...

Consider active learning approach

fwiw, instead adding additional examples to the stream, you can just drop them with probability (1-p) or something. Drawback is that you can't reuse that stream anymore since it hard-codes...