Stockfish icon indicating copy to clipboard operation
Stockfish copied to clipboard

Shorter write_maxply and inaccurate result

Open fsmosca opened this issue 4 years ago • 2 comments

When a user sets the write_maxply to 99 for example to terminate the game more quickly than the default 400 or has an intention to generate training positions at specific plies, the result is always zero because of this code.

I make a revision to solve this issue. I put up a score threshold of 100, that is when write_maxply is reached get the last move score from the score history and if it is 100 or more set the result to 1 and if it is -100 or less set the score to -1 otherwise set the score to 0.

According to master WDL stats, a score of 100 has around 70% winning probability.

Example debug output from gensfen command with the revised code

4b1k1/6b1/p5p1/1p1p1p1p/3P1Q1P/PP3NPK/2q5/4R3 b - - 8 50, score: -524, result: -1
8/3n2k1/6p1/2p1p2r/8/3PN2P/1P4K1/8 b - - 0 50, score: 1286, result: 1
8/5p2/2p5/8/1P6/2k1K3/pr3p2/5R2 b - - 7 50, score: 1625, result: 1
8/8/r7/8/5Pk1/4P3/p3K3/R7 b - - 3 50, score: 55, result: 0
r7/5p1k/6pP/2PP1b2/5B2/1P4R1/3K4/8 b - - 0 50, score: -521, result: -1
8/5p2/7p/r4k2/5P2/6P1/1R6/6K1 b - - 1 50, score: 28, result: 0
8/6q1/7k/2Q3pp/4N3/5PK1/8/8 b - - 15 50, score: -533, result: -1
8/8/8/5KP1/3P4/8/2k5/8 b - - 0 50, score: -1204, result: -1
8/8/1p6/5P2/P7/4K3/2kp4/8 b - - 0 50, score: 1454, result: 1
8/6b1/7p/2p1k2P/2B5/p1P5/2K5/8 b - - 4 50, score: 84, result: 0
8/3P1k2/5p2/8/p1b3P1/1p3K1P/1B6/8 b - - 0 50, score: 76, result: 0
8/8/KP6/8/4k3/8/1r4p1/6R1 b - - 6 50, score: 47, result: 0
8/2p1n3/3p3k/1p1P1P1P/5P1K/1Bb5/8/8 b - - 6 50, score: 613, result: 1
r7/Pk3R2/8/8/2pB4/2P2P2/5K2/8 b - - 2 50, score: -1580, result: -1
8/5k2/5p2/6p1/2p4p/2P1PK2/2r5/R7 b - - 2 50, score: 896, result: 1
8/5p2/B3k3/p5K1/P7/8/8/8 b - - 12 50, score: -423, result: -1
8/8/2k5/8/7P/4PK2/3q1PP1/8 b - - 1 50, score: 956, result: 1
8/6pp/1P3k2/p3pb1P/P7/4B3/1r6/4K3 b - - 0 50, score: 1184, result: 1
8/8/P2p2p1/3K4/5P1p/1kp4P/6P1/8 b - - 0 50, score: 292, result: 1
2B5/2K5/7p/8/4n3/2kq4/8/8 b - - 3 50, score: 1643, result: 1
8/1p2r1k1/p4R2/P3P2P/5KP1/8/8/8 b - - 0 50, score: -1156, result: -1
8/8/8/5pp1/3K1Pk1/8/8/8 b - - 1 50, score: 582, result: 1
8/8/3k4/1P1n1p2/3p2pp/3N1P2/5K2/8 b - - 1 50, score: 708, result: 1
8/8/5ppp/R2nk3/3r4/5B1P/5PP1/2K5 b - - 3 50, score: 4, result: 0
3rr1k1/5p2/3q1b2/2p3pR/2Q1PpP1/PR1P4/4N1K1/8 b - - 41 50, score: 105, result: 1
6k1/r5p1/2n3P1/3K3P/p1p2N2/1p2P3/8/R7 b - - 1 50, score: 444, result: 1
8/5p2/5k2/p4Pp1/2p3P1/4P3/r2nK3/8 b - - 1 50, score: 2580, result: 1
8/R7/8/5k2/5p2/4r2p/5K2/8 b - - 3 50, score: 517, result: 1
8/8/8/1KP2k1p/1P3p2/3R4/3P4/2r5 b - - 3 50, score: -143, result: -1
8/8/p1n2p2/8/3k1PK1/3p4/r7/8 b - - 3 50, score: 2940, result: 1
8/8/2Q3pk/4p3/4Pq2/8/5PB1/6K1 b - - 3 50, score: -611, result: -1
8/8/2K2k1p/R7/8/8/8/8 b - - 6 50, score: -210, result: -1
8/5p2/4p1k1/1R6/6p1/P6r/4K3/7r b - - 3 50, score: 1675, result: 1
8/1k6/4R3/8/K1N1p3/1p1rP3/1P6/8 b - - 2 50, score: -911, result: -1
8/3r2p1/2k5/1pP2pBp/1P5P/6P1/2K5/8 b - - 13 50, score: 566, result: 1
8/2p1n3/7P/6P1/1k1P4/3K4/1p5r/5R2 b - - 1 50, score: 502, result: 1
8/8/8/1K4k1/PP1qp3/8/8/8 b - - 2 50, score: 1485, result: 1
8/8/8/3b3P/3kpRNK/8/8/6r1 b - - 1 50, score: 60, result: 0
8/8/4R1pk/p4p2/PbB1p3/1P2P2P/5K2/1r6 b - - 12 50, score: 149, result: 1
8/5p2/8/8/pK2k3/P4p2/4bB2/4N3 b - - 4 50, score: -113, result: -1
1b6/8/8/1pP5/1P1K1p2/p2Pp1rk/R7/5R2 b - - 0 50, score: -276, result: -1
8/p5k1/6p1/4p2p/3bP2P/8/q3B3/5Q1K b - - 1 50, score: 300, result: 1
8/8/6k1/PR4p1/4K2p/3P3P/r5P1/8 b - - 6 50, score: -605, result: -1
8/7P/1p4k1/2p5/7r/1r6/6K1/5R2 b - - 0 50, score: 1310, result: 1
8/4k1p1/4p2p/2NnN2P/6P1/8/3n1PK1/8 b - - 2 50, score: 1, result: 0
8/6Q1/2R1kp2/4p3/p2r4/7P/6P1/7K b - - 2 50, score: -1501, result: -1
8/8/7p/3p1p2/4k3/6pP/4K1P1/8 b - - 1 50, score: 876, result: 1

This change can probably improve learning when lambda is below 1, that is there is a component to also learn from result.

Training data samples in text format

Saving at ply 98.

write_minply=98
write_maxply=99
fen 4k3/p5r1/2P5/1P1p4/1K1P1RP1/1R6/8/2r5 w - - 1 50
move b3e3
score 695
ply 98
result 1
e
fen 8/4P3/5P1p/3k4/8/7P/p5K1/8 w - - 0 50
move f6f7
score 515
ply 98
result 1
e
fen 8/8/6p1/8/3P2k1/1b2Pp2/1p6/1NrBK3 w - - 1 50
move e1f2
score -2076
ply 98
result -1
e
fen 8/1b6/7p/P3p3/7k/3NP3/5K2/8 w - - 3 50
move d3c5
score 209
ply 98
result 1
e
fen 2k5/8/3R2p1/1PB2p2/1K1Nb3/3rp3/8/8 w - - 2 50
move d6e6
score 406
ply 98
result 1
e
fen 8/6b1/3kp3/7P/4KPP1/8/8/8 w - - 1 50
move g4g5
score 132
ply 98
result 1
e
fen 8/8/8/4k3/1P3R2/1r3KP1/5P2/8 w - - 8 50
move f3g4
score 615
ply 98
result 1
e
fen 8/8/4K1p1/6k1/P3P3/1P6/8/4r3 w - - 2 50
move e4e5
score -596
ply 98
result -1
e
fen 7r/7r/1p1p1kp1/p1pPp1p1/PnP1Bp2/1P1P3P/4KPP1/2R1R3 w - - 40 50
move e2f1
score -53
ply 98
result 0
e
fen 7R/2r4P/4p3/2k2p2/5P1K/6P1/8/8 w - - 3 50
move h4g5
score 692
ply 98
result 1
e

fsmosca avatar Apr 22 '21 09:04 fsmosca

Output net test

Generate train and val data using the old binary and the new binary (with revision) and do learning with lambda=0. training: 10M, D5 val: 100K, D10

gensfen command for training

gensfen seed 100 random_move_count 15 write_minply 30 write_maxply 160 random_multi_pv 3 random_multi_pv_diff 200 set_recommended_uci_options book noob_3moves.epd depth 5 loop 10000000 output_file_name ...

gensfen command for validation

gensfen seed 100 random_move_count 15 write_minply 30 write_maxply 160 random_multi_pv 3 random_multi_pv_diff 200 set_recommended_uci_options book noob_3moves.epd depth 10 loop 100000 output_file_name ...

learn command:

learn targetdir trainingdata seed 100 epochs 1000000 batchsize 1000000 validation_count 100000 use_draw_in_training 1 use_draw_in_validation 1 lr 1.0 lambda 0 nn_batch_size 1000 newbob_decay 0.5 eval_save_interval 5000000 loss_output_interval 1000000 max_grad 0.3 newbob_num_trials 3 smart_fen_skipping smart_fen_skipping_for_validation set_recommended_uci_options validation_set_file_name ...

The training, val and learn command are the same except the old used the old binary while the new used the new binary.

Game test conditions

TC: 10s+50ms book: Noomen_3move.pgn

Result

The new won the match.

Score of new vs old: 428 - 414 - 158  [0.507] 1000
...      new playing White: 206 - 219 - 75  [0.487] 500
...      new playing Black: 222 - 195 - 83  [0.527] 500
...      White vs Black: 401 - 441 - 158  [0.480] 1000
Elo difference: 4.9 +/- 19.8, LOS: 68.5 %, DrawRatio: 15.8 %

fsmosca avatar Apr 23 '21 01:04 fsmosca

I increase the maxply adjudication score from 100 to 500 to make sure that the win/loss result estimate is more reliable than the draw result as winning a game is more difficult than drawing it.

fsmosca avatar Apr 23 '21 07:04 fsmosca