leela-chess
leela-chess copied to clipboard
I'm telling you what you guys need to do to save this project right now!
I'm a Go fan, don't even play chess, but after closely monitoring the situation with Leela Zero Go I have the following directives and command the devs of this Chess project to immediately implement the following changes in this order:
-
"Rollback" to network 253. Swallow the pride, consider it spilled milk under the bridge and move on. Revert back to snapshot of a working network state (aka net 253) and more on from there. NO heroics, no trying to let it naturally dig itself out of a hole, that is waste of precious time and taking your community commuting cycles for granted. This is technicially the most sound way but it comes down to administrative decision of policy! Don't let pride get in the way of progress! This project is in dire straits and on life support right now, and instead of worrying about filling out paperwork and HIPAA rules, we must give CPR asap! Rollback will be the first easiest fix and immediate boost of community morale so there is not a catastrophic lose of public interest and a runaway chain reaction of attrition that once entrench will be impossible to get back due to 'network effect'. Precious time is being lost, as I stated here before, a 53% wr network is actually STRONGER than a 55% network if the 53% network came days or weeks before the 55% network. One must take into account the "time progression curve"... how long it takes to get stronger matters just as much if not even more so than getting stronger in and of itself! https://github.com/gcp/leela-zero/issues/1229
-
Implement a strict >50.0%+ gating. Only always-promote anything that is at least 50.00% wr at 400 games or more. An "always promote" does NOT make any logical sense. Just because it worked for AlphaGo doesn't mean it makes since for a Chess project of this nature. There is no real benefit for hundreds of elos of regression. None whatsoever. Period. Full stop. QED https://github.com/gcp/leela-zero/issues/1229#issuecomment-383311332 [[ I'm advocate for promote all that is better, ie 55% at 400, 54% at 600, 53% at 1000 etc you get the point. Even if it was just 400 tests at 50.1% gating, on average in general over the long run the false positives and false negs will cancel each other out, so no need to do insane # of games like for example 10000 match games or more etc. Now of course I think this would be better than no gating, or even accepting a -50 elo gating like the Chess guys did that.]]
-
Prepare for hybridization. We are the borg, resistance is futile, you will be assimilated. Look at what happend to the Go scene and with LZ Go in the past few weeks. Facebook open sourced OpenGo ELF and LZ had no choice but to adopt it via mixing hybridization if it were to have any chance of surviving or catching up. The same team is going to give you guys the Chess treatment soon, which will be double edge blade and a curse or blessing in disguise depending on whether or not the project has prepared itself for mixing.
-
Implement adaptive promotion as an exception of last resort. If (and only if) no new network of above 50% wr has been promoted within say three days, then and only then shall you pick from the list of all network candidates of the past three days the highest one and go with that and officially promote it. This makes sure one is not stuck on any one network for too long which in any case is NOT healthy for the entire project.
-
Assuming "adaptive promotion as an exception of last resort" doesn't work and it starts sliding generally downhill for a few weeks, then at that point do another "rollback" procedure in order to revert to last known good state of highest local maximum reached etc... ie if no above 50% wr nets promoted within three days, then for the sake of diversity of self play, you promote the highest or closest to 50% net and use that going forward, if in another three days still no above 50% wr nets then repeat this cycle two more times and then commit to reverting back all the way to the last highest peak and starting over with that old network etc.
-
Community morale and engagement is vital, without which the entire project dies a sorry death. Find ways of boosting morale and ways of getting more tracking and snowballing the community engagement and the client volunteers etc. For example, starting naming new network promotions after volunteers or put the future network names up for grabs (allow the individuals in the community to bid to buy the privelege of custom naming future network promotion nets) by letting the public community bid for them and think up of other creative ways to get traction whilst also getting some crowd funding to go towards the long term viability of the project.
-
Copy Leela zero Go (ie the Haylee vs Leela Zero Go series of games) and get a GM to agree to a series of matches once per week for two or three months, pay him a bit if you have to, or crowd fund or raise money to offer him to agree to the terms. This will generate significant long term interest and rev up some much needed positive publicitiy for the project right now that it sorely could use in light of the recent Charlie Foxtrot situation that it seemd to have dugged itself into.
https://www.youtube.com/watch?v=vBwvMSN7h10
@hydrogenpi You're the second to open an issue on this 'issue', wo any comment so far. Stange. Houston ? ....
Just spotted this message on top of lcz.org : "We are actively investigating the strength fluctuations, a few good leads have been found, and fixes are underway." Suppose that explains. @hydrogenpi: have you turned into a fan of gating ? ;-)
I believe we need to have GATING! Gating at 51% is a must have
Lucky for the project that you are not in charge, because there's tons of evidence that the problems began around id237 or so. Just because 253 was the peak self play elo doesn't mean it wasn't bugged af.
That very basic flaw combined with the sheer tone makes me want to ignore you entirely. And all the other half-reasonable suggestions are all continually discussed anyways.
Can you link me to literature about the issue starting back in id237? I would like to read up on it.
Ask around on Discord. To paste my own summarizing comments, which will give you enough to ask the right questions/people on discord (this is in response to one test pointing to 248 being the crossover):
[9:45 PM] Dubslow: well other heuristics indicate problems before 248
[9:45 PM] Dubslow: most notably the hanging queen-history bug started between 242-245
[9:45 PM] Dubslow: and AiledIMorn's skewness graph indicates problems started with 237=v0.8 release
[9:46 PM] Dubslow: im inclined to think that this particular measure just has a slightly different "onboarding curve" so to speak```
bug in question: https://github.com/glinscott/leela-chess/issues/576
I'll say this much: I am running a very long match using v10 at 1m+1s with id223 vs id253, and id223 is leading (not by much it is true) after 300 games.