minigo
minigo copied to clipboard
[Ideas] Open ideas
Seth ideas
- Virtual Batching ideas (re eval effects of virtual loss https://github.com/tensorflow/minigo/issues/427)
- Only add X from batch of Y to the tree, put the rest in NNCache to use if they get needed later (this is basically a different version of Speculative Execution
- Supervised Eval Doc
- Rate cut in SL experiments
- SL training with rate cut and more steps...
- Train some smaller distilled models and test on time parity
Ideas inspired by @lightvector and KataGo
- NNCache
- Turning off Tree Reuse
- Ownernship head
- Score distribution Head
- Score Maximization ("Score Utility")
- Playout oscillation
- Forking games (early for diversity, late for Komi, ...)
Ideas inspired by LZ
- SWA: Initial Proof Of Concept in #283 but more work needed
- Visits "timemanagement" (stopping when 2nd move can't overtake first)
Ideas from AG/AGZ/AZ papers
- Gating: Write up in #459
- Playing anything early within 1% of best
Ideas from elsewhere
- Changing value to 0.95 (AKA 1 - false positive) for selfplay with resign (idea from lightvector)
- Active learning (Needs more details) @brilee (https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a))
- Cyclic LR
Done
- [x] Slow window, not super successful (inspired by Oracle's Connect Four work)
- [x] Q=0 (Sorta tested in V11)
- [x] FPU (Handled by https://github.com/tensorflow/minigo/pull/629)
Check for dead neurons: https://stackoverflow.com/questions/42362542/how-to-monitorise-dead-relus/48782899
z = z * move_num/length z = z/2 + q/2 z = z * false_positive_rate in resign disabled games
higher learning rate early
Adding stuff about distillation and Seth ideas
Checking if eval games have enough diversity and using this opening panel
https://github.com/leela-zero/leela-zero/issues/2104
@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff.
IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search.
And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-)
From Brian Lee:
One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations.