Seth ideas

Virtual Batching ideas (re eval effects of virtual loss https://github.com/tensorflow/minigo/issues/427)
- Only add X from batch of Y to the tree, put the rest in NNCache to use if they get needed later (this is basically a different version of Speculative Execution
Supervised Eval Doc
- Rate cut in SL experiments
SL training with rate cut and more steps...
Train some smaller distilled models and test on time parity

Ideas inspired by @lightvector and KataGo

NNCache
- Turning off Tree Reuse
Ownernship head
Score distribution Head
Score Maximization ("Score Utility")
Playout oscillation
Forking games (early for diversity, late for Komi, ...)

Ideas inspired by LZ

SWA: Initial Proof Of Concept in #283 but more work needed
Visits "timemanagement" (stopping when 2nd move can't overtake first)

Ideas from AG/AGZ/AZ papers

Gating: Write up in #459
Playing anything early within 1% of best

Ideas from elsewhere

Changing value to 0.95 (AKA 1 - false positive) for selfplay with resign (idea from lightvector)
Active learning (Needs more details) @brilee (https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a))
Cyclic LR

Done

[x] Slow window, not super successful (inspired by Oracle's Connect Four work)
[x] Q=0 (Sorta tested in V11)
[x] FPU (Handled by https://github.com/tensorflow/minigo/pull/629)

Sep 24 '18 22:09 sethtroisi

Check for dead neurons: https://stackoverflow.com/questions/42362542/how-to-monitorise-dead-relus/48782899

Oct 02 '18 05:10 sethtroisi

z = z * move_num/length z = z/2 + q/2 z = z * false_positive_rate in resign disabled games

higher learning rate early

Oct 02 '18 09:10 sethtroisi

research on network size

Oct 18 '18 23:10 sethtroisi

Adding stuff about distillation and Seth ideas

Mar 09 '19 01:03 sethtroisi

Checking if eval games have enough diversity and using this opening panel

https://github.com/leela-zero/leela-zero/issues/2104

Mar 19 '19 06:03 sethtroisi

@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff.

IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search.

And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-)

Jun 16 '19 16:06 Ishinoshita

From Brian Lee:

One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations.

Jan 22 '20 00:01 sethtroisi

minigo
minigo copied to clipboard

[Ideas] Open ideas

Seth ideas

Ideas inspired by @lightvector and KataGo

Ideas inspired by LZ

Ideas from AG/AGZ/AZ papers

Ideas from elsewhere

Done

minigo minigo copied to clipboard

[Ideas] Open ideas

Seth ideas

Ideas inspired by @lightvector and KataGo

Ideas inspired by LZ

Ideas from AG/AGZ/AZ papers

Ideas from elsewhere

Done

minigo
minigo copied to clipboard