Stockfish icon indicating copy to clipboard operation
Stockfish copied to clipboard

Fix viewer experience patch in thread voting

Open peregrineshahin opened this issue 9 months ago • 23 comments

https://github.com/official-stockfish/Stockfish/commit/310928e985a6d87bdd73542e2109f93c31e2cc41 was introduced to enhance viewer experience in tournaments and it was supposed to be non-functional this implements the intended and fixes the edge case where the patch can be functional if votes[th->rootMoves[0].pv[0]] == votes[bestThread->rootMoves[0].pv[0]] is true but th->rootMoves[0].pv[0] == bestThread->rootMoves[0].pv[0] is not, leading to changing the move played.

Passed non-reg SMP STC: https://tests.stockfishchess.org/tests/view/6640f6f970e85e1f2ce32460 LLR: 2.94 (-2.94,2.94) <-1.75,0.25> Total: 135096 W: 34589 L: 34482 D: 66025 Ptnml(0-2): 219, 15267, 36455, 15402, 205

Passed non-reg SMP LTC: https://tests.stockfishchess.org/tests/view/66417145f9f4e8fc783c9d26 LLR: 2.95 (-2.94,2.94) <-1.75,0.25> Total: 142906 W: 36093 L: 35999 D: 70814 Ptnml(0-2): 29, 15050, 41198, 15150, 26

No functional change

peregrineshahin avatar May 15 '24 04:05 peregrineshahin

clang-format 17 needs to be run on this PR. If you do not have clang-format installed, the maintainer will run it when merging. For the exact version please see https://packages.ubuntu.com/mantic/clang-format-17.

(execution 9091889136 / attempt 1)

github-actions[bot] avatar May 15 '24 04:05 github-actions[bot]

Worth noting that cj's attempt to fix this with a gainer yielded a green STC, but a yellow LTC and a second, different style LTC is nearly yellow:

https://tests.stockfishchess.org/tests/view/663ce00dc0b75d7f7b981e06 https://tests.stockfishchess.org/tests/view/663d1831507ebe1c0e91f150 https://tests.stockfishchess.org/tests/view/6640edea70e85e1f2ce321f9

dubslow avatar May 15 '24 04:05 dubslow

I wonder if it would make most sense to write

const bool betterVotingValue = thread_voting_value(th) > thread_voting_value(bestThread)
                                    || (newThreadPV[0] == bestThreadPV[0] && newThreadPV.size() > bestThreadPV.size());

It's simpler and the viewer experience may be even better.

mstembera avatar May 15 '24 05:05 mstembera

I'm not sure how reliable a longer PV is in all cases, I have the suspension that a better voting value with a better score and higher depth should output PVs that are more consistent and mature. note that the comparison based on the voting value is completly disabled with this code.

I wonder if it would make most sense to write

const bool betterVotingValue = thread_voting_value(th) > thread_voting_value(bestThread)
                                    || (newThreadPV[0] == bestThreadPV[0] && newThreadPV.size() > bestThreadPV.size());

It's simpler and the viewer experience may be even better.

peregrineshahin avatar May 15 '24 05:05 peregrineshahin

i.e. I don't think sacrificing the eval being more accurate is a good trade. changing the thread to that will be picking the lowest score to the viewer for such move if the PV is the longest.

peregrineshahin avatar May 15 '24 05:05 peregrineshahin

an example would be, if the best thread thinks that the position is +4 eval with some PV and all other move are 0.00, but there is a thread that thinks the position is -9 then we will be picking the -9 thread to the viewer just because it has a longer PV.

peregrineshahin avatar May 15 '24 05:05 peregrineshahin

assuming the last thread we talk about has the same bestMove.

peregrineshahin avatar May 15 '24 05:05 peregrineshahin

Yes my suggestion above does disable thread_voting_value which wasn't my intention so i retract it. However, I now realize that with this patch we are back to allowing truncated pv's if the truncated thread has the largest thread_voting_value. The explicit purpose of https://github.com/official-stockfish/Stockfish/commit/310928e985a6d87bdd73542e2109f93c31e2cc41 was to avoid truncated pv's even in that case. If two threads score the position differently but select the same move it's debatable which score is more correct. It's not debatable which has the non truncated pv.

mstembera avatar May 15 '24 05:05 mstembera

according to the logic in the current master, we still allow truncated PVs if a move has an unparalleled best votes and the best thread is truncated, While the example above is not the most practical, it becomes already vague as to where to draw the line when should we accept changing the move played which can change the game result just because we want to enhance the user experience. So while no truncated PV is a well-defined thing it is not obvious why it should change the whole move we play.

peregrineshahin avatar May 15 '24 05:05 peregrineshahin

according to the logic in the current master, we still allow truncated PVs if a move has an unparalleled best votes and the best thread is truncated,

In master the best thread will only be truncated if there is no other choice.

I think it would be ok to allow a truncated pv if the voting value is greater and it actually changes the move (which is the edge case you want to fix) i.e.,

votes[th->rootMoves[0].pv[0]] == votes[bestThread->rootMoves[0].pv[0]] thread_voting_value(th) > thread_voting_value(bestThread) th->rootMoves[0].pv[0] != bestThread->rootMoves[0].pv[0]

mstembera avatar May 15 '24 05:05 mstembera

Yes current master logic, sounds very reasonable to me, your argument is that there is no other choice according to the voting of the moves, but if someone wants to change/experiment on a different approach of picking bestThread which what drove me to make this patch non-functional as it was claimed, then one might argue as you had another choice to pick no truncated PV which was to pick a less strong move.. i.e. there is no well defined procedure for testing in such case. which what happened here, because the user experience patch is functional https://tests.stockfishchess.org/tests/view/663ce00dc0b75d7f7b981e06 https://tests.stockfishchess.org/tests/view/663d1831507ebe1c0e91f150 https://tests.stockfishchess.org/tests/view/6640edea70e85e1f2ce321f9

peregrineshahin avatar May 15 '24 06:05 peregrineshahin

Regarding the test links you posted I am not against anything that passes elo gaining bounds. This patch has not done that yet it reintroduces truncated pv's.

mstembera avatar May 15 '24 06:05 mstembera

Yes of course, my only argument is that we can't stop people's tests if this patch isn't non-functional and they try to tweak it, which is always nice to cut time. i.e. the patch being functional is already methodologically vulnerable to arguments of simplifying it all together.

peregrineshahin avatar May 15 '24 06:05 peregrineshahin

Regarding the test links you posted I am not against anything that passes elo gaining bounds. This patch has not done that yet it reintroduces truncated pv's.

If I understand the conversation correctly, then the patch that lengthened displayed PVs was merged without any tests on the promise it was non-functional? Then one way forward would be to revert it and and try a new "viewer experience improvement" patch on top? If the new one is non-functional, it would need no testing, of course.

robertnurnberg avatar May 15 '24 06:05 robertnurnberg

Which is what I already did with one shot here.

If I understand the conversation correctly, then the patch the lengthened displayed PVs was merged without any tests on the promise it was non-functional? Then one way forward would be to revert it and and try a new "viewer experience improvement" patch on top? If the new one is non-functional, it would need no testing, of course.

peregrineshahin avatar May 15 '24 06:05 peregrineshahin

@robertnurnberg Sorry I think you got confused because I wrote no functional change , I meant in the commit message that it is no functional change compared to the older Stockfish.

peregrineshahin avatar May 15 '24 06:05 peregrineshahin

This isn't a new "viewer experience improvement" patch though because it has truncated pv's. If someone wants the original patch(currently in master) tested for no regression I'm not opposed.

mstembera avatar May 15 '24 06:05 mstembera

As I replied to you your conclusion is flawed, the fact that current stockfish also can have truncated PV because some move has no other move to overcome it in votes proves that your fix in the first place was not complete, and points to the fact that the issue of PV truncation can only be mitigated rather than fixed in such style.

So the new patch if you would call it one, does not fundamentally do something less than the original patch did, because it never disallowed it 100%

peregrineshahin avatar May 15 '24 06:05 peregrineshahin

Lol We want to minimize truncated pv's as much as possible w/o losing elo. This patch is a step backwards toward that goal.

mstembera avatar May 15 '24 07:05 mstembera

if newThreadMoveVote > bestThreadMoveVote is true and newThreadMoveVote has all/one threads with truncated PVs, then we will have truncated PV in master. So why are you insisting to draw an imaginary line of when it is okay to truncate and when it is not.. rather than non-functional v functional

peregrineshahin avatar May 15 '24 07:05 peregrineshahin

As will this patch except it also will in additional cases. Nothing imaginary.

mstembera avatar May 15 '24 07:05 mstembera

Lol We want to minimize truncated pv's as much as possible w/o losing elo. This patch is a step backwards toward that goal.

Well this goal is easy to improve, you can have 700 patches that improves that good luck maintaining PRs on this topic if that was the case. If you think that's indeed a goal to improve on, then as I pointed out there is still the part of newThreadMoveVote > bestThreadMoveVote which can be ignored if the PV is truncated with some more care and doesn't lose Elo.

peregrineshahin avatar May 15 '24 07:05 peregrineshahin

@peregrineshahin Please re-open the PR and give maintainers time to make a decision. I would be in favour of merging it as is.

robertnurnberg avatar May 15 '24 07:05 robertnurnberg

https://tests.stockfishchess.org/tests/view/66446720bc537f561945306e In view of that test, it looks like removing the thread from voting if it has truncated PV loses 1 Elo which means master indeed maintains the the most "less non truncated PVs while not losing Elo" as of our latest knowledge.

peregrineshahin avatar May 30 '24 05:05 peregrineshahin