fishtest icon indicating copy to clipboard operation
fishtest copied to clipboard

Stochastic optimization for parameters

Open unaiic opened this issue 4 years ago • 62 comments

https://github.com/official-stockfish/Stockfish/issues/2915#issuecomment-678112885

Here some interesting methods were proposed. But regarding the first one, and after having mentioned to @nodchip, he told me about the implementation he used. I took his scripts and created a repo (https://github.com/unaiic/optimizer) where we can adapt them to SF and see how it goes. The scripts make use of Hyperopt, although we could also use Optune; we should see what is best in this case. I think you could help with this :)

unaiic avatar Aug 22 '20 08:08 unaiic

Let me explain what the scirpt does. The script executes the following steps:

  1. Read search parameters from a config file.
    • This config file is formated as a header file with definitions for search parameters.
    • Each search parameter has the following data.
      • name
      • default value
      • min value
      • max value
  2. Select a search parameter set.
    • By the default settings, the first 20 search parameter sets are selected randomly, and the other parameter sets are selected with Tree of Parzen Estimators (TPE).
  3. Write the selected search parameters to a header file.
  4. Build an engine binary with the header file created in 3.
  5. Measure the elo difference between a baseline binary and the binary compiled in 4.
  6. Return the elo measured in 5 to Hyperopt.
  7. Repeat 2-6 specific times.
  8. Apply Gaussian Process to select the best search parameter set.

A point is the elo measurement doesn't need to be accurate. Because TPE assumes that the samples (the measured elo) contain noises. I measured elo with only 48 games.

Another point is that 2-6 can be done in parallel. Hyperopt supports parallel search using MongoDB or other DB. We could be able to distribute tasks between fishtest nodes.

There is one concern. If we measure elo with STC, the engine will get weak in LTC. This happened in computer shogi engine. We should use LTC to measure elo.

nodchip avatar Aug 22 '20 10:08 nodchip

See also: https://github.com/kiudee/chess-tuning-tools.

It’s what is used by Leela developers with good success. It generates very pretty diagrams.

gonzalezjo avatar Aug 22 '20 13:08 gonzalezjo

@gonzalezjo Good one. I think we could also try it, compare it with the others and we'll see how it goes...

unaiic avatar Aug 22 '20 15:08 unaiic

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

unaiic avatar Aug 22 '20 21:08 unaiic

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

Correct. That said, it’s not as bad as it sounds! The “TUNE” macro used for SPSA tuning does that for you, so the code is technically already written.

gonzalezjo avatar Aug 22 '20 21:08 gonzalezjo

@gonzalezjo I tried a simple tune, just for test, but didn't work for me (white always disconnects). I tuned the first engine like this:

T1,510,460,560,5,0.0020 T2,223,210,236,1.3,0.0020

I have two executables (with the names engine1 (tuned) and engine2, both compiled with 'make build ARCH=x86-64-modern'), the network file and an opening book in the same folder. I finally have the simple_tune.json file. It looks like this:

 {
       "engines": [
           {
               "command": "engine1",
               "fixed_parameters": {
                   "Threads": 1
               }
           },
           {
               "command": "engine2",
               "fixed_parameters": {
                   "Threads": 1
               }
           }
       ],
       "parameter_ranges": {
           "T1": "Real(460, 560)",
           "T2": "Real(210, 236)"
       },
       "engine1_tc": "60+0.6",
       "engine2_tc": "60+0.6",
       "opening_file": "noob_3moves.pgn"
   }

I think this is okay, but when I run 'tune local -c simple_tune.json' I get the following 'out.pgn':

[Event "?"] [Site "?"] [Date "2020.08.23"] [Round "1"] [White "engine1"] [Black "engine2"] [Result "0-1"] [FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"] [PlyCount "0"] [SetUp "1"] [Termination "abandoned"] [TimeControl "60+0.6"]

{White disconnects} 0-1

[Event "?"] [Site "?"] [Date "2020.08.23"] [Round "1"] [White "engine2"] [Black "engine1"] [Result "0-1"] [FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"] [PlyCount "0"] [SetUp "1"] [Termination "abandoned"] [TimeControl "60+0.6"]

{White disconnects} 0-1

And so on...

What is wrong with this? I think it may be something related to the TUNE macro, but I'm not sure. Thanks :)

unaiic avatar Aug 23 '20 06:08 unaiic

@unaiic I found your issue here by accident and would like to help. I just recently converted my collection of scripts into a proper library and there might still be rough edges. Don’t hesitate to report bugs to the Issue Tracker. For the next release I am planning to allow more verbose output of cutechess-cli to be forwarded.

What happens if you run the following from the same directory?

cutechess-cli -engine conf=engine1 tc=1 -engine conf=engine2 tc=1 -debug

kiudee avatar Aug 23 '20 14:08 kiudee

@kiudee Fine, I'll report it. BTW, this is the output I get:

Warning: Unknown engine configuration: "engine1" Warning: Invalid value for option "-engine": "conf=engine1 tc=1"

unaiic avatar Aug 23 '20 14:08 unaiic

Hm, is there a engines.json file in the folder? That should be created automatically by the chess-tuning-tools.

kiudee avatar Aug 23 '20 14:08 kiudee

Yeah, my fault (I deleted them when I saw it didn't work). This is the actual output:

7 >engine1(0): setoption name Threads value 1
8 >engine1(0): setoption name T1 value 523.6574654907396
8 >engine1(0): setoption name T2 value 221.84373184658273
8 >engine1(0): uci
8 >engine2(1): setoption name Threads value 1
8 >engine2(1): uci
9 <engine1(0): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
9 <engine1(0): T1,510,460,560,5,0.0020
9 <engine1(0): T2,223,210,236,1.3,0.0020
9 <engine2(1): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
152 <engine2(1): id name Stockfish 220820
152 <engine2(1): id author the Stockfish developers (see AUTHORS file)
152 <engine2(1): option name Debug Log File type string default 
152 <engine2(1): option name Contempt type spin default 24 min -100 max 100
152 <engine2(1): option name Analysis Contempt type combo default Both var Off var White var Black var Both
152 <engine2(1): option name Threads type spin default 1 min 1 max 512
152 <engine2(1): option name Hash type spin default 16 min 1 max 33554432
152 <engine2(1): option name Clear Hash type button
153 <engine2(1): option name Ponder type check default false
153 <engine2(1): option name MultiPV type spin default 1 min 1 max 500
153 <engine2(1): option name Skill Level type spin default 20 min 0 max 20
153 <engine2(1): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine2(1): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine2(1): option name nodestime type spin default 0 min 0 max 10000
153 <engine2(1): option name UCI_Chess960 type check default false
153 <engine2(1): option name UCI_AnalyseMode type check default false
153 <engine2(1): option name UCI_LimitStrength type check default false
153 <engine2(1): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine2(1): option name UCI_ShowWDL type check default false
153 <engine2(1): option name SyzygyPath type string default <empty>
153 <engine2(1): option name SyzygyProbeDepth type spin default 1 min 1 max 100
153 <engine2(1): option name Syzygy50MoveRule type check default true
153 <engine2(1): option name SyzygyProbeLimit type spin default 7 min 0 max 7
153 <engine2(1): option name Use NNUE type check default true
153 <engine2(1): option name EvalFile type string default nn-82215d0fd0df.nnue
153 <engine2(1): uciok
153 >engine2(1): isready
153 <engine2(1): readyok
153 <engine1(0): id name Stockfish 220820
153 <engine1(0): id author the Stockfish developers (see AUTHORS file)
153 <engine1(0): option name Debug Log File type string default 
153 <engine1(0): option name Contempt type spin default 24 min -100 max 100
153 <engine1(0): option name Analysis Contempt type combo default Both var Off var White var Black var Both
153 <engine1(0): option name Threads type spin default 1 min 1 max 512
153 <engine1(0): option name Hash type spin default 16 min 1 max 33554432
153 <engine1(0): option name Clear Hash type button
153 <engine1(0): option name Ponder type check default false
153 <engine1(0): option name MultiPV type spin default 1 min 1 max 500
153 <engine1(0): option name Skill Level type spin default 20 min 0 max 20
153 <engine1(0): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine1(0): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine1(0): option name nodestime type spin default 0 min 0 max 10000
153 <engine1(0): option name UCI_Chess960 type check default false
153 <engine1(0): option name UCI_AnalyseMode type check default false
153 <engine1(0): option name UCI_LimitStrength type check default false
153 <engine1(0): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine1(0): option name UCI_ShowWDL type check default false
154 <engine1(0): option name SyzygyPath type string default <empty>
154 <engine1(0): option name SyzygyProbeDepth type spin default 1 min 1 max 100
154 <engine1(0): option name Syzygy50MoveRule type check default true
154 <engine1(0): option name SyzygyProbeLimit type spin default 7 min 0 max 7
154 <engine1(0): option name Use NNUE type check default true
154 <engine1(0): option name EvalFile type string default nn-82215d0fd0df.nnue
154 <engine1(0): option name T1 type spin default 510 min 460 max 560
154 <engine1(0): option name T2 type spin default 223 min 210 max 236
154 <engine1(0): uciok
154 >engine1(0): isready
154 <engine1(0): readyok
Started game 1 of 1 (engine1 vs engine2)
154 >engine1(0): ucinewgame
154 >engine1(0): setoption name Ponder value false
154 >engine1(0): position startpos
154 >engine2(1): ucinewgame
154 >engine2(1): setoption name Ponder value false
154 >engine2(1): position startpos
154 >engine1(0): isready
156 <engine1(0): readyok
156 >engine1(0): go wtime 1000 btime 1000
157 <engine1(0): info string ERROR: NNUE evaluation used, but the network file nn-82215d0fd0df.nnue was not loaded successfully.
157 <engine1(0): info string ERROR: The UCI option EvalFile might need to specify the full path, including the directory/folder name, to the file.
157 <engine1(0): info string ERROR: The default net can be downloaded from: https://tests.stockfishchess.org/api/nn/nn-82215d0fd0df.nnue
157 <engine1(0): info string ERROR: If the UCI option Use NNUE is set to true, network evaluation parameters compatible with the program must be available.
157 <engine1(0): info string ERROR: The engine will be terminated now.
Terminating process of engine engine1(0)
159 >engine2(1): isready
159 <engine2(1): readyok
Elo difference: -inf +/- nan
Finished match
Finished game 1 (engine1 vs engine2): 0-1 {White disconnects}
Score of engine1 vs engine2: 0 - 1 - 0  [0.000] 1
Elo difference: -inf +/- nan
Finished match
159 >engine2(1): quit

unaiic avatar Aug 23 '20 14:08 unaiic

Ok, there are a few things you can try.

  1. Add "EvalFile": "nn-9931db908a9b.nnue" or the full absolute path to your simple_tune.json file.
  2. Add "Use NNUE": "false" instead.

edit: I could try to make the library a bit more robust by allowing the user to set the working directory.

kiudee avatar Aug 23 '20 14:08 kiudee

Okay, now it seems to be playing games. I understand that this behaviour (from log.txt) is normal?

2020-08-23 16:47:44,956 INFO Starting iteration 0 2020-08-23 16:47:44,956 INFO Testing {'T1': 474.6329987400783, 'T2': 225.4754269784815} 2020-08-23 16:47:44,956 INFO Start experiment 2020-08-23 16:50:15,376 INFO Experiment finished (150.419198s elapsed). 2020-08-23 16:50:16,006 INFO Got score: -0.4660222762857485 +- 0.26871653930371303 2020-08-23 16:50:16,007 INFO Updating model 2020-08-23 16:50:16,007 INFO GP sampling finished (0.000237s) 2020-08-23 16:50:16,105 INFO Starting iteration 1 2020-08-23 16:50:16,106 INFO Testing {'T1': 499.14523211540876, 'T2': 210.65957941253217} 2020-08-23 16:50:16,107 INFO Start experiment

unaiic avatar Aug 23 '20 14:08 unaiic

Yes, that is the expected output.

kiudee avatar Aug 23 '20 14:08 kiudee

@kiudee Okay, thank you so much. I guess there is no need to report any issue then (it only needed the path to the net to be specified). We'll continue to test things out and see how it goes :)

unaiic avatar Aug 23 '20 15:08 unaiic

@nodchip @gonzalezjo I suppose we should also consider this option and see its results. It's a simple option with an easy setup to work with. IMHO this is a good enough reason to test this out. Of course we also have @nodchip's implementation to work on and adapt it to SF. What are your thoughts on this?

unaiic avatar Aug 23 '20 15:08 unaiic

How about to compare three methods, SPSA, Hyperopt and chess-tuning-tools. The point will be computer resources vs elo improvements.

nodchip avatar Aug 23 '20 15:08 nodchip

@nodchip True. Computer resources are limited. The simple tune I started ~45mins ago has just started the 14th iteration (and I changed it to be TC 5'+0.05s), which shows that it's rather difficult to be able to compare them without external help.

unaiic avatar Aug 23 '20 15:08 unaiic

I would be interested in hearing @kiudee’s thoughts on his approach vs. Hyperopt and vs. SPSA, if he has the time and interest to share.

gonzalezjo avatar Aug 23 '20 15:08 gonzalezjo

@gonzalezjo I can write down a few general considerations which hold without having done extensive experiments of all three approaches.

The biggest problem we have in optimizing several parameters at the same time is the curse of dimensionality. It is clear that the amount of points we have to collect blows up exponentially in the number of (effective) parameters (we basically have to build a "wall" for the optimum). The noise resulting from chess games inflates the number of iterations in addition by a constant factor.

Now there are different ways to deal with this problem. With very strong regularity assumptions, you are able to optimize many parameters at the same time, but if those assumptions are violated you might only be able to find a local optimum (e.g. SPSA). Hyperopt and chess-tuning-tools use asymptotically non-parametric methods. That means that in principle if you run them for long enough and keep exploring, they will model the optimization landscape perfectly and be able to find the true global optimum. Hyperopt uses a tree of parzen estimators as the model which has problems accurately extrapolating the function and the uncertainty to unknown regions. chess-tuning-tools uses the Gaussian process implementation of bayes-skopt, which allows it to accurately quantify the uncertainty in each point. In addition we can use the properties of the Gaussian process to employ advanced optimization algorithms such as max-value entropy search (default in chess-tuning-tools) and predictive variance reduction search.

What are the downsides? Since Hyperopt and chess-tuning-tools use a very flexible model, they eventually also suffer from the curse of dimensionality. If you want to optimize more than say 10 parameters, it will be difficult to model the target function in few iterations (< 2000). For Gaussian processes there are methods which are able to fit a lower dimensional subspace, allowing them to generalize with less number of iterations, but they are not yet implemented. Another consideration is running time: The Gaussian process implementation is optimized to be accurate with few iterations, but its complexity is cubic in the number of iterations. Thus if you collect over 2000 iterations the overhead per iteration could be several minutes. That’s why I would recommend to increase the number of games per iteration to make each datapoint less noisy and to scale the approach that way.

What would be my recommendation?

  • For large numbers of parameters (>20): I don’t expect any algorithm to work well. I think it might even be hard to beat random search. SPSA could already work adequately.
  • For small numbers of parameters (<7): I would definitely use something like chess-tuning-tools here, since that’s what it’s optimized for. A compromise could also be to start with SPSA to optimize a large number of parameters and then switch to chess-tuning-tools for finetuning.
  • For intermediate numbers of parameters (≥7, ≤20): I really don’t know. More experimentation is needed.

kiudee avatar Aug 23 '20 16:08 kiudee

Could we say then that there are two groups: 1) SPSA and 2) Hyperopt and chess-tuning-tools? If both methods from the second group behaved similarly and if Hyperopt didn't add any irrelevant advantages (@nodchip might have more insights on this), we could try to implement chess-tuning-tools (as it is by far the easiest one) and compare it to SPSA. And taking into account what @kiudee said, we could even try to mix SPSA and chess-tuning-tools for big tunes and see how it goes. IMO we should try to implement it right on fishtest; otherwise we won't have enough resources to test this things out.

unaiic avatar Aug 23 '20 17:08 unaiic

Thanks a lot for the explanation, kiudee!

gonzalezjo avatar Aug 23 '20 18:08 gonzalezjo

Also let me know if you need any specific functionality implemented.

kiudee avatar Aug 23 '20 19:08 kiudee

BTW, let me plug an advertisement for a framework I wrote a while ago, which allows for methods in the nevergrad suite of optimization methods to be used. Right now hardwired to TBPSA

https://github.com/vondele/nevergrad4sf

I can't say it was hugely successful, the resources needed to fine tune parameters in SF is just very large.

vondele avatar Aug 24 '20 06:08 vondele

@kiudee @nodchip What are your thoughts on this option (compared to the others)?

unaiic avatar Aug 24 '20 10:08 unaiic

I think that the best way is to try each method one by one, and compare the results.

nodchip avatar Aug 24 '20 10:08 nodchip

I agree with nodchip. There is no data yet to definitively favor one method over another.

kiudee avatar Aug 24 '20 11:08 kiudee

Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them.

unaiic avatar Aug 24 '20 14:08 unaiic

I want to add that usually people use chess-tuning-tools with 10-100 games per iteration. If you distribute each tested configuration on fishtest you can get a very accurate score for the optimizer and converge much faster. I would use slightly different settings than the default settings in that case.

kiudee avatar Aug 24 '20 15:08 kiudee

Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them.

More realistic is to find some easy tunable parameters, set the starting points a bit far from the optima, and make a comparative test at very short time control (e.g. 1+0.01 or even 0.5+0.005). With SPSA worked fine.

ppigazzini avatar Aug 24 '20 16:08 ppigazzini

@ppigazzini Search parameters however don't fall within that definition (most times), as they're very TC sensitive. We could of course try it (also with classical eval terms; they may be more TC independent). Anyway, that's a good starting point :)

unaiic avatar Aug 24 '20 18:08 unaiic