I have an idea
Recently, I've been playing b18 versus b28.
I found some conclusions:
- theie evaluations on scores and points are often similar.
- when it differs, their is quite a possibility(maybe 20%?) that b18 gets a better result locally.
- even in exterme case, it happens that b18 find a notlikely mentioned points in 28 that might reverse the result.
I come up with a way to increase win rate for b28, is that both looking at the variation of b18 and b28, when it differs, seek through the b18's path for a while and judge again.
nets are b28c512nbt-s8536703232-d4684449769 b18c384nbt-s9996604416-d4316597426 /. and the game is actually trivial, I just played it once and it occurs that situation.
I then wonder, what if we integrate 6b when the AI-weakness, which is that some simple moves but judges wrong by high rank AI occurs? And I wonder whether can integrate this process into training, and restart it whole, which means that 10b's training is integrated with the best of 6b's judges. and so on? (just seek 1 before can be helpful I think).
details are: (just use humans hands to operate for example) when choices don't differ much, use b28's. when they differ much (such as b18 is considering a point which b28 almost ignores, or win rate highly differs(which probably means some future point differ)), then seek in that variant. or then gather b6 in and distribute a small potion of calculations in it to deal with extreme situations. I wonder that whether if it's good to only self-play in the distributive training in this way.
This is an interesting observation and a promising proposal. As I know, a similar method has already been implemented:
Specifically, KataGo collects valuable games from rating games using the following criterion: when the win rate drops significantly within a few moves. These games are then fed back into training as opening positions with added hints. Since they come from different networks, this helps the model learn various preferences and improve blind spots. This solution emerged from discussions among members on Discord.
After using this method for some time, I revisited the rating games and found that severe blind spots had become extremely rare.
However, personally, I don't think this is sufficient. All games lost by strong networks in rating games should be reintroduced into training, along with a new training method—trajectory search. This means analyzing each move in the game record, and when a move diverges from the network's original prediction, assigning weights based on the degree of surprise. This method expands the scope of detection, effectively learning many non-fatal blind spots and optimism policy moves. Additionally, it can be used to absorb other high-computation and high-quality game records.
I played b18 vs b28.
- b18's choices are similar to b28's.
- I found that the cause of b28 winning the game is merely because b28 got more portion of better points, but we can't ignore that b18 sometimes get a better point spot, even it lost the game.
- I suggest that if the winrate/score of lead is similar, like +-1%, then use b28's, otherwise dive into the variation, it's like a dfs search, the state is the ongoing game, just exploring through the variations which doesn't require much resources(in my thought?)
like we can also distribute a small portion of calculations in the early days of nets, just to find the difference, even if it's 1 move that explored to be better, it's worth it. the whole computation should be within e times the original which sounds good.
@Chenvincentkevin - If you can proof of concept your idea, by showing that it leads to a strength improvement compared to spending the same amount of compute power on doing more b28 visits instead, then I would be interested. Can you implement it? It doesn't even have to be high-performance, for example you could build it in python using the analysis engine (https://github.com/lightvector/KataGo/blob/master/python/query_analysis_engine_example.py, documentation https://github.com/lightvector/KataGo/blob/master/docs/Analysis_Engine.md) to query a b18 and b28.
My suspicion is that you would discover more improvements simply by spending the extra visits on b28 instead of b18.
@lightvector I wonder what's the computation cost for 1 visit between b18 and b28, I want to first play some games by myself and discover the win rates
@Chenvincentkevin The computational cost of b28 is approximately double that of b18.