minigo icon indicating copy to clipboard operation
minigo copied to clipboard

[Ideas] Open ideas

Open sethtroisi opened this issue 6 years ago • 7 comments

Seth ideas

  • Virtual Batching ideas (re eval effects of virtual loss https://github.com/tensorflow/minigo/issues/427)
    • Only add X from batch of Y to the tree, put the rest in NNCache to use if they get needed later (this is basically a different version of Speculative Execution
  • Supervised Eval Doc
    • Rate cut in SL experiments
  • SL training with rate cut and more steps...
  • Train some smaller distilled models and test on time parity

Ideas inspired by @lightvector and KataGo

  • NNCache
    • Turning off Tree Reuse
  • Ownernship head
  • Score distribution Head
  • Score Maximization ("Score Utility")
  • Playout oscillation
  • Forking games (early for diversity, late for Komi, ...)

Ideas inspired by LZ

  • SWA: Initial Proof Of Concept in #283 but more work needed
  • Visits "timemanagement" (stopping when 2nd move can't overtake first)

Ideas from AG/AGZ/AZ papers

  • Gating: Write up in #459
  • Playing anything early within 1% of best

Ideas from elsewhere

  • Changing value to 0.95 (AKA 1 - false positive) for selfplay with resign (idea from lightvector)
  • Active learning (Needs more details) @brilee (https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a))
  • Cyclic LR

Done

  • [x] Slow window, not super successful (inspired by Oracle's Connect Four work)
  • [x] Q=0 (Sorta tested in V11)
  • [x] FPU (Handled by https://github.com/tensorflow/minigo/pull/629)

sethtroisi avatar Sep 24 '18 22:09 sethtroisi

Check for dead neurons: https://stackoverflow.com/questions/42362542/how-to-monitorise-dead-relus/48782899

sethtroisi avatar Oct 02 '18 05:10 sethtroisi

z = z * move_num/length z = z/2 + q/2 z = z * false_positive_rate in resign disabled games


higher learning rate early

sethtroisi avatar Oct 02 '18 09:10 sethtroisi

research on network size

sethtroisi avatar Oct 18 '18 23:10 sethtroisi

Adding stuff about distillation and Seth ideas

sethtroisi avatar Mar 09 '19 01:03 sethtroisi

Checking if eval games have enough diversity and using this opening panel

https://github.com/leela-zero/leela-zero/issues/2104

sethtroisi avatar Mar 19 '19 06:03 sethtroisi

@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff.

IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search.

And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-)

Ishinoshita avatar Jun 16 '19 16:06 Ishinoshita

From Brian Lee:

One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations.

sethtroisi avatar Jan 22 '20 00:01 sethtroisi