leela-chess
leela-chess copied to clipboard
Experiment with alternate fpu definition
Looking at the new ELF system for go, it seems they use a very subtly different fpu definition. Its kind of like get_raw_eval(), but rather than being the weighted average of NN val of current node and the NN val of all visited children, its the weighted average of the NN val of all visited children, and the fpu used when the parent was first selected for expansion. (With an initial value of 0 for the tree root.) (Assuming I've managed to read their code correctly.)
I'm not exactly sure how the difference plays out in practice, but it seems interesting and maybe worth some benchmarking.
I also had two related ideas about FPU - maybe its already been done, tested and discarded, but just to throw it in here.
(1) if current search increases eval so that it is larger than m_init_eval, everything seems fine in the explored region and FPU can do its normal job - penalizing not visited nodes. But what if the opposite is happening eval is sinking below m_init_eval? Then the opposite might be helpful and exploration of the remaining policy should be prioritzed. So in latter case FPU could be added and not subtracted to the score (maybe even asymetrically).
(2) This might be more useful for chess then for go. Basically the FPU penality should only be used for our own moves (skipping the opponents moves). That way we would not risk missing any "suprising" traps that hide in the small rest percentage of opponents policy. We might not find some great low policy moves for us either - but probably the position has other advantages, or else we would not have played into it. Actually this has been tried successfully in chess before with traditional search algos (being more selective on ones own moves than on the opponents).
P.S.: Leela chess afaik does no longer know about if its her turn or the opponents and treats both identically - if that were to be reversed, leela might even learn behaviour like in (2) on her own.